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INTEGRATED AUTHORING AND 
TRANSLATION SYSTEM 

This is a continuation application of application Ser. No. 
08/363,309, filed Dec. 22, 1994. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

Ihe present invention relates generally to computer-based 
document creation and translation system and, more 
particularly, to a system for authoring and translating 
constrained-language text to a foreign language with no pre- 
or post-editing required. 

2. Related Art 

Every organization whose activities require the generation 
of vast quantities of information in a variety of documents 
is confronted with the need to ensure their full intelligibility. 
Ideally, such documents should be authored in simple, direct 
language featuring all necessary expressive attributes to 
optimize communication. This language should be consis- 
tent so that the organization is identified through its single, 
stable voice, 'lliis language should be unambiguous. 

The pursuit of this kind of writing excellence has led to 
the implementation of various disciplines designed to bring 
the authoring process under control. Yet authors of varied 
capabilities and backgrounds cannot comfortably be made to 
fit a uniform skill standard. Writing guidelines, rules and 
standards are elusive — difficult to define and enforce. EfTorts 
aimed at both standardizing and improving on the quality of 
writing tend to meet with mixed results. However achieved 
and however successful, these results push up documenta- 
tion authoring costs. 

Recent attempts at surrounding authors with the software 
environment that might enhance their productivity and the 
quality of their writing have only succeeded in providing 
spell checkers. The effectiveness of other writing software 
has so far been disappointingly weak. 

When the need to deliver information calls for the cross- 
ing of linguistic frontiers, the challenges multiply. The 
organization that needs to clear a channel for its information 
flow finds itself to a great extent, if not totally, dependent on 
translation. 

Translation of text from one language to another language 
has been done for hundreds of years. Prior to the advent of 
computers, such translation was done completely manually 
by experts, called translators, who were fluent in the lan- 
guage of the original text (source text) and in the language 
of the translated text (target text). Typically, it was preferable 
for the translator to have originally learned the target lan- 
guage as his/her native tongue and subsequently have 
learned the source language. Such an approach was felt to 
result in the most accurate and efficient translation. 

Even the most expert translator must take a considerable 
amount of time to translate a page of text. For example, it is 
estimated that an expert translator translating technical text 
from English to Japanese can only translate approximately 
300 words (approximately one page) per hour. It can thus be 
seen that the amount of time and effort required to translate 
a document, particularly a technical one, is extensive. 

The requirements for translation in business and com- 
merce has grown steadily in the last hundred years. This is 
due to several factors. One is the rapid increase in the text 
associated with conducting business internationally. Another 
is the large number of languages that such texts must be 
translated into in order for a company to engage in global 
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commerce. A third is the rapid pace of commerce which has 
resulted in frequent revisions of text documents, which 
requires subsequent translation of new versions. 
Many organizations have the responsibility for creating 

5 and distributing information in multiple languages. In the 
global marketplace, the manufacture must ensure that the 
manuals are widely available in the host languages of their 
target markets. Manual translation of documents into foreign 
languages is a costly, time-consuming, and inefficient pro- 

10 cess. Translatioas arc usually inconsistent owing to the 
individual interpretation of the translators who are not 
necessarily well-versed in the application specific language 
used in the documentation. Because of these problems, 
fewer manuals than would be ideal are actually translated. 

15 In the areas of research and development, the explosion of 
knowledge which has occurred in the last century has also 
geometrically increased the need for the translation of 
documents. No longer is there one predominant language for 
documents in a particular field of research and development. 

20 Typically, such research and development activities are 
taking place in several advanced industrialized countries, 
such as, for example, the United States, United Kingdom, 
France, Germany, and Japan. Many times there are addi- 
tional languages containing important documents relating to 

25 the particular area of research and development. Advances in 
technology, particularly in electronics and computers, have 
further accelerated the production of text in all languages. 

The ability to produce text is directly proportional to the 
capability of the technology that is used. When documents 

30 had to be hand-written, for example, an author could only 
produce a certain number of words per unit of time. This 
increased significantly, however, with the advent of 
mechanical devices, such as typewriters, mimeograph 
machines, and printing presses. The advent of electronic, 

35 computer, and optical technology increased the capability of 
the author even further. Today, an average author can 
produce significantly more text in a given unit of time than 
any author could produce using the hand-written methods of 
the past. 

40 

This rapid increase in the amount of text, coupled with 
enormous advances in technology, has caused considerable 
attention to be paid to the subject of translation of text from 
its source language to a target language(s). Considerable 

45 research has been done in universities as well as in private 
and governmental laboratories, which has been devoted to 
trying to figure out how translation can be accomplished 
without the intervention of a human translator. 

Computer-based systems have been devised which 

50 attempt to perform machine translation (MT), Such com- 
puter systems are programmed so as to attempt to automati- 
cally translate source text as an input into target text as an 
output. However, researchers have discovered that such 
computer systems for automatic machine translation are 

5S impossible to implement using present technology and theo- 
retical understanding. No system exists today which can 
perform the machine translation of a source natural language 
to a target natural language without some type of editing by 
expert editors/translators. One method is discussed below. 

60 In a process called pre-editing, source text is initially 
reviewed by a source editor. The task of the source editor is 
to make changes to the source text so as to bring it into 
conformance with what is known to be the optimal state for 
translation by the machine translation system. This conform- 

65 ancc is learned by the source editor through trial and error. 
'ITie pre-editing process just described may go through 
iterations by additional source editors of increasing compe- 
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tence. The source text thus prepared is submitted for pro- DETAILED DESCRIPTION OF THE PRESENT 

cessing to the machine translation system. The output is INVENTION 
target language text which, depending on the purposes of the I. Integrated System Overview 

translation of quality requirements of the user, may or may The computer-based system of the present invention 

not be post -edited. 5 provides functional integration of: 

If the translation quality required must be comparable to 1) d ^ u aU e l n h ra r !d enVir0n,nent ^ ^ deVCl ° Pmenl ° f 

thai of proficient human translation, the output of machine ^ 4 , , ' , . ... 

translation will most likely have to be post-edited by a 2) A module for accurate, machine translation into mul- 

... . . ™. «j . ■ , ru tiplc languages without pre- or post-editing, 

competent translator. This is due to the complexity of human TT .... " . & . ? . . *_ f . f & .... , 

. r ... . . ■ . c .. in Utilizing this technology in the production of multilingual 

language and the comparatively modest capabilities of the W docum( f ntati the J?^ assur( £ of consistently accurate, 

machine translation systems that can be bu.lt with present ^ cost-efficient translation, whether in small or large 

technology, within natural limitations of time and resources, volumes , and with virtually simultaneous release of infor- 

and with a reasonable expectation of meeting cost- mation in both the language and the languages 

effectiveness requirements. Most of the modest systems that targeted for translation 

are built require, indeed, the post-editing activity, intended 15 ^ {q ^ ^ !anguage authoring ^ 

to approximate, by whatever measure, the quality levels of lion togethef wUh the translation faction is based on two 

purely human translation. principles: 

Once such system is the KBMT-89 designed by the Center i) i n a multinational, multilingual business environment, 

for Machine Translation, Carnegie Mellon University, which 2Q the information is not considered to be fully developed 

translates English to Japanese and Japanese to English. It unt il it [ s deliverable in the various languages of the 

operates with a knowledge based domain model which aids users. 

in interactive disambiguation (i.e., editing of the document 2 ) Combining the authoring and translation processes 

to make it unambiguous). However, this interactive disam- wilhin a unified f ramework i eads l0 efficiency gains 

biguation is not typically done interactively with an author. 25 that cannot otherwise be achieved. 

Once the system finds an ambiguous sentence that it cannot FIG. 1(a) shows a high level block diagram of the 

disambiguate, it must stop the process and resolve ambigu- integrated Authoring and Translation System (IATS) 105. 

ities by asking a author/translator a series of multiple-choice Ulle IArs 105 prov ides a specialized computing environ- 

questions. In addition, since the KBMT-89 does not utilize ment ded i calec j to supporting an organization in authoring 

a well-defined controlled input language the so-called trans- 3Q documentation in one language and translating it into vari- 

lator assisted interactive disambiguation produces text ous others. These two disunct functions are supported by an 

which requires post-editing. integrated group of programs, as follows: 

In view of the above, it would be advantageous to have a i) Authoring — one subgroup of the programs provides an 

translation system that eliminates both pre- and post-editing. interactive computerized Text Editor (TE) 140 which 

35 enables authors to create their monolingual text within 

SUMMARY OF THE INVENTION t he lexical and grammatical constraints of a domain- 

rj, . . t . . . r . . . . , bound subset of a natural language, the subset desig- 

The present invention is a system of integrated, computer- . , _ t . . _ & ? /^« ¥ t 

UJ f r i j ♦ j i « . nated Constrained Source Language (CSL). 

based processes tor monolingual document development . . . ™-, - ,, . r L 

j i.-r i . i k " t . ■ j Additionally, the TE 140 enables authors to further 

and multilingual translation. An interactive computerized . _ ^ t . , . 

text editor enforces lexical and grammatical constraints on a 40 P[ e P are ,he le *' for t ^ nslat ' on by gu.d.ng them through 

natural language subset used by the authors to create their thc P" 06 " 8 ° f , tcxt disambiguat.on wh.ch renders the 

* „ j t ,i . j. . . tU . , . t text translatable without pre-editing; 

text, and supports the authors in disambiguating their text to ; *" 

ensure its translatability. The resulting translatable source 2 ) Translation-another subgroup of the programs pro- 
language text undergoes machine translation into any one of vidcs thc Machine Translation (MT) 120 function, 
a set of target languages, without the translated text requir- 45 ca P abIe of translating the CSL into as many target 
ing any post-editing, languages as the generator module has been pro- 
grammed to generate, with thc resulting translation 

BRIEF DESCRIPTION OF THE DRAWINGS requiring no post-editing. 

For a system that features translation as a central 
FIGS. 1(a) and 1(b) are high level block diagrams of the so component, the integration of the authoring and the trans- 
architecture of the present invention. lation functions of the present invention within a unified 
FIG. 2 is a high level flowchart of the operation of the framework is the only way devised to date that eliminates 
present invention. both pre- and postediting. 

FIG. 3 is a high level informational flow and architectural P» . texl 140 fa » M of tools * ?PP? rt lhe "J 111 " 0 " 

ui~~i, a ' - Zc \at n« 55 and editors in creating documents in CSL. These tools will 

block diagram of MT 120. f . 

° help authors to use the appropriate CSL vocabulary and 

FIG. 4 shows an example of an information element. grammar to write their documents. The TE 140 eommuni- 

FIG. 5 is a block diagram of the domain model 500. cates with the author 160 (and vice versa) directly. Referring 

FIG. 6 is a high level flow diagram of thc operation of thc to FIG. 1(b), the IATS 105 is divided into four main parts to 

language editor 130 60 perform thc authoring and translation functions: (1) a Con- 

HG. 7 is a flow diagram illustrating the operation of the S ?/^, L „ aDgU 1 a ?' (C 5 1 L) 13 \ ( ?> 4 l)"^™ ^ 

vocabulary checker 610. I 40 ; ® » • MT 120, and (4) a Domain Model (DM) 137. The 

' Text Editor 140 includes a Language Editor 130 and a 

FIG. 8 is a high level flow diagram of the disambiguation Graphics Editor 150. In addition, a File Management System 

block 630. 6S (p MS ) HQ is ^ prov idcd for controlling all processes, 

FIG. 9 is an informational flow and architectural block TTie CSL 133 is a subset of a source language whose 

diagram of MT 120. grammar and vocabulary cover the domain of the author's 
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documentation which is to be translated. The CSL 133 is 
defined by specifications of the vocabulary and grammatical 
constructions allowed so that the translation process is made 
possible without the aid of pre- and post editing. 

The TE 140 is a set of tools to support authors and editors 
in creating documents in CSL These tools will help authors 
to use the appropriate CSL vocabulary and grammar to write 
their documents. 'l*he LE 130 communicates with the author 
160 (and vice versa) via the text editor 140. The author has 



Control lines 111-113 provide the necessary control infor- 
mation for proper operation of IATS 105. 

Initially, the author 160 will use the FMS 110 to choose 
a document to edit, and the FMS 110 will start the text editor 
140, displaying the file for the specified document. Via the 
text editor 140, the author enters text that may be uncon- 
strained and ambiguous text into the IATS 105, as shown in 
blocks 160 and 220. The author 160 will use standard editor 
commands to create and modify the document until it is 



bi-directional communication via line 162 with the text 10 ready to be checked for CSL compliance. Note that is it 
editor 140. The LE 130 informs the author 160 whether anticipated that authors will mostly enter text that is sub- 
words and phrases that are used are in CSL. The LE 130 is stantially prepared with the CSL constraints in mind. The 
able to suggest synonyms in CSL for words that are relevant text will then be modified by the author in response to 
to the domain of information which includes this document, system feedback, based on violations to the pre-d e terra incd 
but are not in CSL. In addition, the LE 130 tells an author 15 lexical and grammatical constraints, to conform to the CSL 



This is, of course, much more efficient than initially entering 
totally unconstrained text. However, the system will operate 
properly even if totally unconstrained text is entered from 
the start. 

The author's communication with the LE 130 consists of 
mouse click or keystroke commands. However, one should 
note that other forms of input may be used, such as but not 
limited to the use of a stylus, voice, etc., without changing 
the scope or function of the present invention. An example 



160 whether or not a piece of text satisfies CSL grammatical 
constraints. It also provides an author with support in 
disambiguating sentences that may be syntactically correct 
but are semantically ambiguous. 

The MT 120 is divided into two parts: a MT analyzer 127 20 
and a MT generator 123. The MT analyzer 127 serves two 
purposes: it analyzes a document to ensure that the docu- 
ment unambiguously conforms to CSL and produces inter- 
lingua text. The analyzed CSL-approved text is then trans- 
lated into a selected foreign (target) language 180. The MT 25 of an input is a command to perform a CSL check or to find 
120 utilizes an Intcrlingua-bascd translation approach. the definition and usage example for a given word or phrase. 
Instead of directly translating a document to another foreign The CSL text that may contain residual ambiguity or 
language, the MT generator 123 transforms the document stylistic problems is analyzed for conformity with CSL and 
into a language-independent, computer-readable form called checked for compliance with the grammatical rules con- 
Inter lingua and then generates translations from the Inter- 30 tained in the knowledge bases, as shown in block 230. The 
lingua text. As a result, translated documents will require no author is provided feedback to correct any mistakes via 
postediting. A version of the MT 120 is created for each feedback line 215. Specifically, the LE 130 provides infor- 
language and will consist primarily of a set of knowledge mation regarding non-CSL words and phrases and sentences 
sources designed to guide the translation of Interlingua text to the author 160. Finally, the text is checked for any 
to foreign language text. In particular, for every new target 35 ambiguous sentences. The LE prompts the author to select 



language, a new MT generator 123 must be individually 
developed. 

When fully functional, the LE 130 will sometimes need to 
ask the author 160 to choose from alternative interpretations 
for certain sentences that satisfy CSL grammatical con- 
straints but for which the meaning is unclear. This process is 
known as disambiguation. After the LE 130 has determined 
that a particular part of text uses only CSL vocabulary and 
satisfies all CSL grammatical constraias, then the text will be 



an appropriate interpretation of a sentence's meaning. This 
process is repeated until the text is fully disambiguated. 

Once the author has made all the necessary corrections to 
the text, and the analysis phase 230 has completed, the 
40 disambiguated/constrained text 240 is passed to the MT 
analyzer and interpreter 250. The interpreter resides in the 
MT analyzer 127 together with the syntactic part of the 
analyzer and translates the disambiguated/constrained text 
240 into interlingua 260. The interlingua 260 is in turn 
labeled CSL-approved, pending this disambiguation. As 45 translated by generator block 270 into the target text 280, As 
explained below, disambiguation will not require any shown in FIG. 3, the interlingua text 260 is in a form that can 
changes to the author-visible aspects of the text. After the be translated to multiple target languages 306-310. 
text has been disambiguated it will be ready for translation By requiring and enabling the author to create documents 
into the target language 180. that conform to specific vocabulary and grammatical 

In practice, the LE 130 is built as an extension to the text so constraints, it is feasible to perform the accurate translation 
editor 140 which provides the basic word processing func- of constrained-language texts to foreign languages with no 
lionality required by authors and editors to create text and postediting required. Postediting is not required since the LE 
tables, 'l*he graphics editor 150 is used for creating graphics. vocabulary check block 217 and analysis block 230 have 
The graphics editor 150 provides a means for accessing the caused the author to modify and/or disambiguate all possibly 
text labels on graphics through the text editor 140, so these 55 ambiguous sentences and all non-translatable words from 
text labels can be CSL-approved as well. the document before translation. 

The LE 130 (via text editor 140) communicates with the II. Detailed Description of the Functional Blocks 
MT analyzer 127 and, through it, with the DM 137 during In a preferred embodiment, each author will have sole use 
disambiguation via bidirectional socket-to-socket lines 132 of a DECstation with 32 Meg of RAM, a 400-megabyte disk 
and 133. In the preferred embodiment of the present 60 drive, and a 19-inch color monitor. Each workstation will be 
invention, the DM is one of the knowledge bases that feeds configured for at least 100 Meg of swap from its local disk, 
the MT analyzer 127. The DM 137 is a symbolic represen- In addition to the authors' workstations, DECservers will be 
tat ion of the declarative knowledge about the CSL vocabu- used as file servers, one for every two authoring groups, for 
lary used by the MT analyzer 127 and the LE 130. a total of no more than 45 users per file server. Furthermore, 

FIG, 2 shows a high level flowchart of the operation of 65 authoring workstations will reside on an Ethernet local 
IATS 105. 'ITie MT 120, LE 130, text editor 140, and network. 'lTie system uses the Unix operating system (a 
graphics editor 150 are all controlled by the FMS 110. Berkeley Standard Distribution (BSD) derivative is prefer- 
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able to a System V (SYSV) derivative). A C programming (3) identifying contents (e.g., part number) as discussed in 

language compiler and OSF/Motif libraries are available. (2); 

The LE will be run within a Motif window manager. It (4 x a i lowing partia i sentences to be translated (e.g., bul- 

should be noted that the present invention is not limited to j etec j j tems y 

the above hardware and software platforms and other plat* 5 _ v . . . ' . . , , , . „ , 

forms are contemplated by the present invention. ( 5 > ass,s f t,n e ,n «""»"««* tables (one cell .at a time) by 

A Text Editor identifying structure of text. 1 nis feature is similar to 

The preferred embodiment of the present invention pro- that descnbed in (*)! 

vides a text editor 140 which allows the author to input (6) assisting the parsing process (described below) 

information that will eventually be analyzed and finally 10 through (2), (3), (4), (5); 

translated into a foreign language. Any commercially avail- (7) assisting in disambiguation by providing a means of 

able word processing software can be used with the present inserting invisible tags into the source text so as to 

invention. A preferred embodiment uses a SGML text editor indicate the correct interpretation of an ambiguous 

140 provided by ArborTcxt (ArborTcxt Inc., 535 West sentence; 

William St., Ann Arbor, Mich. 48103). The SGML text 15 (8) assisting in translating currencies and mathematical 

editor 140 provides the basic word processing functionality units through the identification of specific types of text 

required by authors and editors, and is used with software by that require special treatment. 

InterCap (of Annapolis, Md.) for creating graphics. (9) providing a means of labeling a portion of text as 

The present invention utilizes a SGML text editor 140 translatable. In other words, certifying that a portion of 

since it creates text using Standard Generalized Markup 2 o text has advanced through the process outlined below 

Language (SGML) tags. SGML is an International Standard aric j that the text is unambiguous constrained text that 

markup language for describing the structure of electronic can oe translated without postediting, 

documents. It is designed to meet the requirements for a In the past, authors have created (by way of the text editor 

wide range of document processing and interchange tasks. 140) electronic documents (text only — no graphics) that 

SGML tags enable documents to be described in terms of 25 represented a complete "book." This implies that all work is 

their content (text, images, etc) and logical structure done by one writer, and that the information created is not 

(chapters, paragraphs, figures, tables, etc.) In the case of eas ii y reuse d. The present invention, however, compiles (or 

larger, more complex, electronic documents, it also makes it creates) books (manuals, documents) from a set of smaller 

possible to describe the physical organization of a document pieces or information elements, which implies that the work 

into files. SGML is designed to enable documents of any 30 can be done by multiple writers. The result of this invention 

type, simple or complex, short or long, to be described in a ^ enhanced reusability. An information element is defined as 

manner that is independent of both the system and applica- lne smallest stand-alone piece of service information about 

tion. This independence enables document interchange a specialized domain. It should be noted, however, that 

between different systems for different applications without although a preferred embodiment utilizes information 

misinterpretation or loss of data. 35 elements, the present invention can produce accurate, unam- 

SGML is a markup language, that is, a language for biguous translated documents without the use of information 

"marking up" or annotating text by means of or by using elements. 

coded information that adds to the conventional textual FIG. 4 shows an example of an information element 410 

information conveyed by a given piece of the text. In most wn i c h includes a "unique" heading 415, a "unique" block of 

cases it takes the form of sequences of characters at various 40 lext 420, a "shared" graphic 430, a "shared" table 435, and 

points throughout an electronic document. Each sequence is a "shared" block of test 425. 

distinguishable from the text around it by the special char- "Unique" information is that information which applies 

acters that begin and end it. The software can verify that the on i y t0 the information element in which it's contained. This 

correct markup has been inserted into the text by examining j mp ij cs that the "unique" information is filed as part of the 

the SGML lags upon request. The markup is generalized in 45 information element 450. 

that it is not specific to any particular system or task. For a A "shared" object (a graphic, table, or block of text) is 
more in depth discussion of SGML tags see International information that is "referenced" in the information clement. 
Standard (ISO) 8879, Information processing— Text and The content of "shared" objects are displayed in the author- 
office systems— Standard Generalized Markup Language \ ng ioo \ b ut only "pointed to" in the filed information 
(SGML), Ref. No. ISO 88794986(E), 50 element 450. 

The following capabilities arc possible due to the use of "Shared" objects differ from information elements in that 

the SGML tags: they do not stand-alone (i.e., they do not convey enough 

(1) dividing documents into fragments or translatable information by themselves to impart substantive 
units. The text editor 140 software uses both punctua- information). Each "shared" object is in itself a separate file 
tion and SGML tags to recognize translatability units in 55 as shown in block 450. 

the source input text (e.g., an SGML tag is necessary to Information elements are formed by combining "unique" 

identify section titles); blocks of information (text and/or tables) with one or more 

(2) shielding (insulating) units that will not be translated. "shared" objects. Note that "unique" heading 415 and 
Although the system is based on the premise that all "unique" text 420 is combined with "shared" graphic 430, 
words and sentences will belong to the constrained 60 "shared" table 435, and "shared" text 425. A set of one or 
language that cannot be predicted in advance (for more information elements make up a complete document 
example, names and addresses) or classes of vocabu- (book). 

lary that cannot (readily) be exhaustively specified (for "Shared" objects arc stored in "shared" libraries. Library 

example, part numbers, error messages from types include "shared" graphic libraries 460a, "shared" 

machinery). SGML lags can be put around these items 65 tables libraries 460b, "shared" text libraries 460c, "shared" 

to indicate to the system that they are exempt from audio libraries 460rf, and "shared" video libraries 460e. A 

checking; shared object is stored only one time. When used in indi- 
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vidual information elements, only "pointers" to the original 
shared object will be placed in the information shared file 
450. 'lliis minimizes the amount of disk space that will be 
required. When the original object is changed, all those 
information elements that "point" to that object arc auto- 
matically changed. A shared object can be used in any 
publication type. 

A"shared information element" is an information element 
that is used in more than one document. For example, the 



form to provide. Consider the following caption, in the case 
where the verb "view" is not in CSL, but has the CSL 
synonym "see": 

Direction of Crankshaft Rotation (when viewed from 
flywheel end) 

The Vocabulary Checker will not know if "saw" or "seen" 
should be offered as a synonym for "viewed." Of course, in 



same four information elements in release library 470 arc 10 this case a reasonable course of action might be to offer both 



used to create portions of documents 480 and 485 

All communication between the author and the LE 130 
will be mediated by an LE User Interface (UI), implemented 
as either an extension of standard SGML Editor facilities 
such as menu options, or in separate windows. The UI 
provides and manages access to and control of the CSL 
checkers and CSL vocabulary look-up, and it is the primary 
tool enabling users to interact with the CSL LE. Although 
the term "user interface" is often used in a more general 



possibilities and allow the author to choose the appropriate 
one. Because there is no certainty that every case will allow 
a presentation that enables the author to order a direct 
replacement. LE 130 provides a list of replacement options 
in the correct form where possible. There may be cases, 
though, when the author will find it necessary to edit a 
suggested CSL word or phrase before requesting that it be 
put into the document. 

Finally, the LE UI provides support for disambiguating 



sense to refer to the interface to an entire software system, 20 tne meaning of sentences. It does this by providing a list of 



here the term will be restricted to mean the interface to the 
CSL checkers, vocabulary look-up facility, and the disam- 
biguation facility. 

Among other things, the UI must provide clear informa- 
tion regarding (a) the actions the LE is taking, (b) the result 25 
of these actions, and (c) any ensuing actions. For example, 
whenever an action initiated through the UI introduces more 
than a very brief, real-time pause, the UI should inform the 
author of a possible delay by means of a succinct message 



possible alternative interpretations to the author, allows the 
author to select the appropriate interpretation, and then tags 
the sentence so as to indicate that authors selection. 
C. File Management System 

The File Management System (FMS) 110 serves as the 
authors' interface the IE Release Library 470 and the SGML 
text editor 140. Typically, authors will select an IE to edit by 
indicating the file for that IE in the FMS interface. The FMS 
110 will then initiate and manage an SGML Editor session 



The author can invoke LE functionality by choosing an 30 f or that IE . Finished documents will be forwarded to a 



option from a pull -down menu in text editor 140. The 
available options allow the author to initiate and view 
feedback from CSL checking (both vocabulary and grammar 
checking) and from vocabulary look-up. The author can 
request that checking be initiated on the currently displayed 35 
document or request vocabulary look-up on a given word or 
phrase. 

The UI will clearly indicate each instance of non-CSL 
language found in the document. Possible ways of indicating 



human editor or Information Integrator via FMS-controlled 
facilities. 

O. Constrained Source Language (CSL) 

Given the complexity of today's technical documentation, 
high quality machine translation of natural language uncon- 
strained texts is practically impossible. The major obstacles 
to this are of a linguistic nature. The crucial process in 
translating a source text is that of rendering its meaning in 
the target language. Because meaning lies under the surface 



50 



non-CSL language include the use of color and changes to 40 0 f textual signals, such overt signals have to be analyzed, 
font type or size in the SGML Editor window. The UI will 
display all known information regarding any non-CSL word. 
For example, in appropriate cases the UI will display a 
message saying that the word is non-CSL but has CSL 
synonyms, as well as a list of those synonyms. 

In cases where a Vocabulary Checker report includes a list 
of alternatives to the non-CSL word in focus (for example, 
spelling alternatives or CSL synonyms), the author will be 
able to select one of those alternatives and request that it be 
automatically replaced in the document. In some cases, the 
author may have to modify (i.e., add the appropriate ending) 
the selected alternative to ensure that it is in the appropriate 
form. 

When an author requests vocabulary information, the UI 
will display spelling alternatives, synonyms, a definition, 55 
and/or a usage example for the item indicated. 

The author can move quickly and easily between checker 
information and vocabulary look-up information inside the 
UI. This enables the author to perform information searches 
(e.g., synonym look-up) during the process of changing the 60 
documents to remove non-CSL language. 

In most cases, the UI provides automatic replacement of 
non-CSL vocabulary with CSL vocabulary, with no need for 
the user to modify the CSL word to ensure that it is in the 
appropriate form. However, there arc some cases in which 65 
the vocabulary checker (described below), which does no 
parsing of a document, will not be able to identify the correct 



The meaning resulting from this analysis is used in the 
process of generating the signals of the target language. 
Some of the most vexing translation problems result from 
those features inherent in language which hinder analysis 
45 and generation. 

A few of these features are: 
1. Words with more than one meaning in an ambiguous 
context 

Example: Make it with light material. 
[Is the material "not dark" or "not heavy"?] 
Words of ambiguous makeup 

Example: The German word "Arbeiterinformation" is 
either "information for workers" [Arbeiter+ 
Information] or "formation of female workers" 
[Arbeitcrin+Formation] 
Words which play more than one syntactic role 
Round may be a noun (N), a verb (V), or an adjective (A): 
(N) Liston was knocked out in the first round. 
(V) Round off the figures before tabulating them. 
(A) Do not place the cube in a round box. 
Combinations of words which may play more than one 
syntactic role each 

Example: British Left Waffles on Fa lk lands. 
[If Left Waffles is read as N+V, the headline is about the 
British Left] 



5/12/04 EPR1.1 17-32 



6,139,201 

11 12 

[If Left Waffles is read as V+N, the headline is about the referred to as CSL Grammatical Constraints. The computa- 

Brilish] tional implementation of CSL grammatical constraints used 

5. Combinations of words in ambiguous structures to analyze CSL texts in the MT component will be referred 
Exam le- Visitine relatives can be borine 10 as me ^ L Functional Grammar, based on the well known 

P ' 1 1 1 e " 5 formalisms developed by Martin Kay and later modified by 

[Is it the "visiting of relatives" or the "relatives who visit" R Kaplan and j Bresnan (^e Kay, M., "Parsing in Func- 

which can be boring?] t i ona j Unification Grammar," in D. Dowty, L Karttunen and 
Example: Lift the head with the lifting eye. A. Zwicky (cds.), Natural Language Parsing: Psychologi- 
es the "lifting eye" an instrument or a feature of the cal. Computational, and Theoretical Perspetives, 

"head"?] 10 Cambridge, Mass.: Cambridge University Press, pgs. 

6. Confusing pronominal reference 251-278 (1985) and Kaplan R. and J. Bresnan, "Lexical 
Example: The monkey ate the banana because it was . . . Functional Grammar: A Formal System for Grammatical 
r„M «•„ r l i i , ... m Representation, in J. Bresnan (ed.), The Mental Represen- 
[What does "it refer back to, the monkey or the banana?] J. Qn q{ Grammatical Re i ations> Cambridge, Mass.: MIT 
Generation problems add to the above, increasing the Press , pgs . i 72 -281 (1982) both of which are incorporated 

overall difficulty of machine translation. 15 . reference 

T*e magnitude of the translation problems is considerably y , n ^ ^ of ^ document we refer frequently t0 lhe 

lessened by any reductions of the range of linguistic phc- noUon ^ a WOfd 0f hfase ma be CSL „ Qf « DOt ^ 

nomena the language represents. A sublanguage covers the CSL we wi „ descdbe ^ ass ^ a5out the 

range of objects processes and relations within a hmited 0 f vocabulary restrictions that will be imposed by CSL 

domain. Yet a sublanguage may be limited in its lexicon 20 ^ ^ ^ ^ of ^ s]on ^ CSL „ 

while it may not necessarily be hmited in the power of its ^ Mme WQrd Qf hfase m En ^ ish can have ma 

grammar. Under controlled situations, a strategy aimed at meanin for ^ a al dictio . 

taci Hating machine translation is that of constraining both na ^ ^ foUowin definitions for the word "leak": 

the lexicon and the grammar of the sublanguage. ' J . . , ° f . . 

Constraints on the lexioon limit its sile by avoiding 25 (0 verb: to perm.t the escape of someth.ng through a 

synonyms, and control lexical ambiguity by specializing the y ^ . ' . . 

lexical units for the expression of, as far as possible, one (2) verb: to disclose information without official authority 

meaning per unit. It is easy to imagine how these restrictions or sanctl0n ; and 

would avoid the problems exemplified in 1, 2, and 4, above. 0) noun: a crack or opening that permits something to 

Grammatical constraints may simply rule out processes like 30 escape from or enter a container or conduit, 

pronominalization (6 above) or require that the intended Each of these different meanings is referred to as a 

meaning be made clearer either through addition or repeli- "sense" of the word or phrase. Multiple senses for a single 

tion of otherwise redundant information or through rewrite. w ° rd or P^ase can cause problems for an MT system, which 

The following example sets the parameters for application of doesn't have all the knowledge that humans use to under- 

this requirement: 35 sland wmcn of several possible senses is intended in a given 

IT , . . ~ .. , / . . . u * « sentence. For many words, the system can eliminate some 

Unconstrained, ambiguous English (which can be inter- 3 . . ' ; _ _ . Afl . „ , ,„ 

preted as either A, Bl, or D2 below): • a *»V*V by recognizuig the part of speech of the word as 
r ' used in a particular sentence (noun, verb, adjective, etc.). 
Clean the connecting rod and main bearings. This fe possible because each definition of a word is par- 
Unambiguous English version A: 40 ticular to the use of that word as a certain part of speech, as 
Clean the connecting rod bearings and the main bearings. indicated above for "leak." 

Unambiguous English version Bl: However, to avoid the kinds of ambiguity that the MT 120 

™ . . , . .-a cannot eliminate, the CSL specification strives to include a 

Clean the main bearings and the connecting rod. ' , ur u . c u 

_ , . , & single one sense of a word or phrase for each part of speech. 

Unambiguous English version B2: 45 when a WOfd Qr phrase ^ « in CSL „ it can ^ ^ in 

Clean the main bearings and the connecting rods. CSL in at least one 0 f l]s p0S sible senses. For example, an 

The number and types of lexical and grammatical con- author wr j t i n g in CSL may be allowed to use "leak" in 

straints may vary widely depending on the purpose of senses ^ and above, but not in sense (2). Saying that a 

development of the constrained sublanguage. word or phrase j s «j n CSL" does not mean that all possible 

In view of the above, the present invention limits the 50 uses of thc word or pnrase can be translated, 

authoring of documents within the bounds of a constrained | f a word or phrase ^ ; n CSL> lhen al i f orrns G f that word 

language. A constrained language is a sublanguage of a or phrasc lhat can cxprcss l{s CSLsensc(s) arc also in CSL. 

source language (e.g., American English) developed for thc ln the above example, an author may use not only the verb 

domain of a particular user application. For a discussion «i eak » bu t also the related verb forms "leaked," "leaking" 

generally of constrained or controlled languages see Adri- 55 and «[eaks." if a word or phrase with a noun sense is part of 

aens et al, From COGRAM to ALCOGRAM: Toward a CSLj both its singular and plural forms may be used. Note, 

controlled English Grammar Checker, Proc. of Coling-92, however, phrases which function as more than one part of 

Nantes (Aug. 23-28, 1992) which is incorporated by refer- speech are uncomm0 n. This heuristic is therefore less rel- 

ence. In the context of machine translation, the goals of the evanl m me case 0 f an ambiguous phrase, 

constrained language are as follows: 60 A vocabulary is thc collection of words and phrases used 

1. To facilitate consistent authoring of source documents, in a particular language or sublanguage. A limited domain 
and to encourage clear and direct writing; and will be referred to by means of a limited vocabulary which 

2. To provide a principled framework for source texts that is used to communicate or cxprcss information about a 
will allow fast, accurate, and high-quality machine limited realm of experience. An example of a limited domain 
translation of user documents. 65 might be farming, where the limited vocabulary would 

'ITie set of rules that authors must follow to ensure that the include terms concerning farm equipment and activities. The 

grammar of what they write conforms to CSL will be MT component will operate on more than one kind of 
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vocabulary. The words and phrases for machine translation It should be noted that although the bulk of the discussion 
will be stored in the MT lexicon. The vocabulary can be in this document concerning the constrained source Un- 
divided into different classes: (1) functional items; (2) guage and/or language in general centers around American 
general content items; and (3) technical nomenclature. English, analogous comparisons can be made in connection 
Functional items in English arc the single words and word 5 with all other languages. There is nothing inherent about the 
combinations which serve primarily to connect ideas in a system 100 described herein that requires American English 
sentence. They are required for almost any type of written to be the source language. In fact, the system 100 is not 
communication in English, lliis class includes prepositions designed to work with American English as the only source 
(to, from, with, in front of, etc.), conjunctions (and, but, or, language. However, the databases (e.g., the domain model) 
if, when, because, since, while, etc.), determiners (the, a, 10 that interact with the LE 130 and MT 120 will have to be 
your, most of), pronouns (it, something, anybody, etc.), changed to correspond to the constraints of the particular 
some adverbs (no, never, always, not, slowly, etc.), and source language. 

auxiliary verbs (should, may, ought, must, etc.). The rules of standard American English orthography must 

General content words are used in large measure to be followed. Non-standard spellings, such as "thru" for 

describe the world around us; their main use is to reflect the 15 "through," "moulding" for "molding," or "hodometer" for 

usual and common human experience. Typically, documents "odometer" are to be avoided. Capitalized words (e.g., 

focus on a very specialized part of the human experience On-Off, Value Planned Repair) should only be used to 

(e.g., machines and their upkeep). As such, the general indicate special meaning of terms. These terms must be 

vocabulary will be relatively restricted for MT. listed in the user application vocabulary. Such is also the 

The technical nomenclature comprises technical content 20 case for non-standard capitalization usage (BrakeSaver). 

words and phrases, and user application specific vocabulary. Likewise, abbreviations, when used (ROPS, API, PIN), must 

Technical content items are words and phrases which are be listed in the user application specific vocabulary. The 

specific to a particular field of endeavor or domain. Most format for numbers, units of measurement, and dates must 

technical words are nouns, used to name items, such as parts, be consistent. 

components, machines, or materials. They may, however, 25 Constrained language recovery items should also be used 

also include other classes of words, such as verbs, according to their constrained language meaning. In doing 

adjectives, and adverbs. Obviously, as these words are not so, the writer assures that the MT always translates a word 

used in common, everyday conversation, they contrast with by using the proper constrained language word sense. Some 

general content words. English words can also belong to more than one syntactic 

Technical content phrases are multiple-word sequences 30 category. In the constrained language, all syntactically 

built up from all the preceding classes. These phrases are the ambiguous words should be used in constructions that 

most characteristic form of technical documentation disambiguate them. 

vocabulary. The user application specific vocabulary is the One difficult problem arising from the special nature of 

part of the terminology that contains distinctly user appli- the domain is, in some fields, the frequent use of lengthy 

cation created words and complex terms. These include the 35 compound nouns. The modification relationships present in 

following: product names, titles of documents, acronyms such compound nouns are expressed differently in different 

used by the user, and from numbers. languages. Since it is not always feasible to recover these 

The development of a useful and complete vocabulary is relationships from the source text and express them in the 

important for any documentation effort. When documenta- target language, complex compound nouns with the foltow- 

tion is subsequently translated, the vocabulary becomes an 40 ing characteristics may be listed in the MT lexicon: 

important resource for the translation effort. The MT 120 is Technical terms from the user application specific 

designed to handle most functional items available in vocabulary; and 

English, except those referring to very personal (I, me my, Compound terms consisting of more than one word, 

etc.) or gcndcr-bascd (hers she, etc ) or other prc.normnal (.1 Complicated noun-noun compounding should be avoided, 

them, etc.) usage, litis wUl include a number of techmca 45 $ ib)e However> wjth soffle jtems ^ ^ (he 

borrowings from English general words (such as truck me MT fa c ble of handlin lhis imporlan , c h aracteristic 

or length ). The vast majority of the constrained anguagc of docurncntation . Notc that „ OU n-noun compounding which 

vocabulary, then will cons.st of the spec.al (e.g., j, , fcaU)re of , he fi lfah language> may not 

lechn.cal) terms of one or more words, which express the necessaril be , common fealure of other , and as 

objects and processes of the specal domain. To the extent so such> Jhe constraims under whicn the language 

that the vocabulary is able to express the full range of fa cfealed differs wnicn the particular language being 

notions about the special domain, the vocabulary is said to utilized 

be complete. English is very rich in verb -particle combinations, where 

The development of a streamlined but complete vocabu- a yefb fc combined with a preposition , adverb , or other part 

ary contributes greatly to the success of the IATS system 55 of h M tfac ^ CM ^ ^ ^ from tnc 

105. ITie constrained language by specifying proper and yerb b Qb - QT Qlher ^ causes lcxit and 

improper use of vocabulary, will assure that the documents ambi it in MX processing of lhe inpul text . Accordingly, 

can be produced in a manner conducive to fast, accurate, and verb . particle combinations should be rewritten wherever 

high-quality machine translation. iWe ^ cafl usuaU be ac lished b usi a 

Vocabulary items should reflect clear ideas and be appro- 60 single _ word verb instead . For examplef use: 
pnate for the target readership, lerms which are sexist, 

colloquial, idiomatic, overly complicated or technical, ( must ° r oeed in place of jiave to ; 

obscure, or which in other ways inhibit communication "consult*' in place of "refer to"; 

should be avoided. These and other generally accepted "start the motor" in place of "turn the motor on"; 

stylistic considerations, while not necessarily mandatory for 65 Full lerms and ideas should be used wherever possible. This 

M T-oriented processing, are nevertheless important guide- is particularly important where misunderstandings may 

lines for document production in general. arise. For example, in the phrase: 
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The domain model or concept lexicon contains an onto- 
logical model, which provides uniform definitions of basic 
categories (such as objects, event-types, relations, 
properties, episodes, etc.) used as building blocks for 
5 descriptions of particular domains. This "world** model is 
relatively static and is organized as a multiply intercon- 
nected network of onto logical concepts. The general devel- 
opment of an ontology of an application (sub)world in is 
well known in the art. See, for example, Brachman and 
10 Schmolze, An Overview of the KL-ONE Knowledge Repre- 
sentation System, Cognitive Science, vol. 9, 1985; Lenat, et 
al, Cyc: Using Common Sense Knowledge to Overcome 
Brittleness and Knowledge Acquisition Bottlenecks, Al 
Magazine, VL65-85, 1985; Hobbs, Overview of the Tacitus 
15 Project, Computational Linguistics, 12:3, 1986; and Niren- 
burg et al, Acquisition of Very Large Knowledge Bases: 
Methodology Tools and Applications, Center for Machine 
Translation, Carnegie Mellon University (1988) all of which 
are incorporated herein by reference. 

The ontology is a language-independent conceptual rep- 
resentation of a specific subworld, such as heavy equipment 
troubleshooting and repair or the interaction between per- 
sonal computers and their users. It provides the semantic 
information necessary in the sublanguage domain for pars- 



"Use a monkey wrench to loosen the bolt. . . ." 
the word wrench must not be omitted. While most techni- 
cally capable people would understand the implication with- 
out this word, it must be rendered explicit during the 
translation process. CTE text must have vocabulary which is 
explicitly expressed wherever possible; abbreviations or 
shortened terms should be rewritten into lexically complete 
expressions. 

Consider another example: 

"If the electrolyte density indicates that . . /' 

Here the meaning is more explicit and complete when the 
idea is fully expressed: 

"If measurement of the electrolyte density indicates that . 

Finally, in the following sentences have words or phrases 
missing, the underlined words are supplied to make the 
meaning more redundant: 
Turn the start switch key to OFF and remove the key. 
Pull the backrest (1) up, and move the backrest to the 

desired position. 
Jump starling: make sure the machines do not touch each 
other. 

When such "gaps" are filled, the idea is more complete 
and a meaningful translation by IATS 105 becomes more 
certain. Translation errors due to gaps are a common reason 25 ing source text in inter lingua text and generating target texts 
for postediting. Hence, gaps are disallowed. from interlingua texts. The domain model has to be of 

Colloquial or spoken English often favors the use of very sufficient detail to provide sufficient semantic restrictions 
general words. This may sometimes result in a degree of that eliminate ambiguities in parsing, the ontological model 
vagueness which must be resolved during the translation must provide uniform definitions of basic ontological cat- 
process. For example, words such as conditions, remove, 30 egories that are the building blocks for descriptions of 



20 



facilities, procedure, go, do, is for, make, get, etc. are correct 
but imprecise. 
In a sentence like: 

When the temperature reaches 32° R, you must take 
special precautions, 
the word "reaches" does not communicate whether the 
temperature is dropping or rising; one of these two terms 
would be more exact here, and the text just as readable. 

Some languages make distinctions where English docs 



particular domains. 

In a world model, the ontological concepts can be first 
subdivided into objects, events, forces (introduced to 
account for intentionless agents) and properties. Properties 
35 can be further subdivided into relations and attributes. 
Relations will be defined as mappings among concepts (e.g., 
"belongs-to" is a relation, since it maps an object into the set 
{* human *organization}), while attributes will be defined as 
mappings of concepts into specially defined value sets (e.g., 



not always do so; for example, we say oil for either a 40 "temperature" is an attribute that maps physical objects into 



lubricating fluid, or one used for combustion; we say fuel 
whether or not it is dicscl. Similarly, when the word door is 
used in isolation, it is not always possible to tell what kind 
of door is meant. A car door? A building door? A compart- 
ment door? Other languages may need to make these dis- 45 
tinctions. Wherever possible, full terms should be used in 
English. 
D. Domain Model 

Knowledge-based Machine Translation (KBMT) must be 
supported by world knowledge and by linguistic semantic 50 
knowledge about meanings of lexical units and their com- 
binations. A KBMT knowledge base must be able to repre- 
sent not only a general, laxonomic domain of object types 



values on the semi-open scale [0,*], with the granularity of 
degrees on the Kelvin scale). Concepts are typically repre- 
sented as frames whose slots are properties fully defined in 
the system. 

Domain models are a necessary part of any knowledge- 
based system, not only a knowledge-based machine trans- 
lation one. The domain model is a semantic hierarchy of 
concepts that occur in the translation domain. For instance, 
we may define the object *0-VEHICLE to include 
♦O-WHEELED- VEHICLE and *0-TRACKED- VEHICLE, 
and the former to include *0-TRUCK, *0- WHEELED- 
TRACTOR, and so on. At the bottom of this hierarchy are 
the specific concepts corresponding to terminology in CSL. 
We call this bottom part the shared K/DM. In order to 



such as "car is a kind of vehicle," "a door handle is a part 

of a door," artifacts are characterized by (among other 55 translate accurately we must place semantic restrictions on 

properties) the property "made-by"; it must also represent the roles that different concepts play. For instance, the fact 

knowledge about particular instances of object types (e.g., that the agent role of an *E-DRIVE action must be filled by 

"IBM" can be included into the domain model as a marked a human is a semantic restriction placed on *0- VEHICLE, 

instance of the object type "corporation") as well as and automatically inherited by all types of vehicles (thus 

instances of (potentially complex) event types (e.g., the 60 saving repetitious work in hand coding each example). The 

election of George Bush as president of the United States is Authoring part of the domain model augments the K/DM 

a marked instance of the complex action "to-elect"). The with synonyms not in CSL and other information to provide 

ontological part of the knowledge base takes the form of a useful feedback to the author as he or she composes each 
multihierarchy of concepts connected through taxonomy- 
building links, such as is-a, parl-of, and some others. We call 05 
the resulting structure a multihierarchy because concepts are 
allowed to have multiple parents on each link type. 



information element. 

FIG. 5 conceptually illustrates the Domain Model (DM) 
used by the present invention. 'Hie DM 500 is a represen- 
tation of the declarative knowledge about the CSL vocabu- 
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lary used by the MT 120 and the LE 130. The DM 500 is Once again, the K/DM contains all information required 

made up of three distinct parts: by both the MT 120 and the LE 130. This includes a CSL 

1. A Kernel Domain Model (K/DM) 510 contains all lexical item — the base word, phrase, or quoted term and a 
lexical information that is required by both the MT semantic concept — the semantic concept associated with the 
analyzer 127 and the LE 130; in particular, the kernel 5 lexical item, represented in a lexical entry by a "concept 
includes all CSL lexical items (words and phrases) with name." Further, it includes a part of speech — one of a fixed 
associated semantic concepts, parts of speech, morpho- set 0 f parts of speech (e.g., verb, adjective, etc.), a 
logical information, etc. definition — a rough definition for general vocabulary terms, 

2. A MT Domain Model (MT/DM) 520 which contains t o clarify which of several senses a CSL lexical item may 
information that is required only by the MT analyzer i„ have, and irregular morphological variants— a listing of 
127. The MT Domain Model is the hierarchy of con- irregular morphological forms and the name of the morpho- 
cepis used for unambiguous mapping and semantic x ical lrans f ormation s for each. Examples of names of 
verification in translation. It includes selectional mor phological transformations for verbs are "past", "third 
oTSnceT" C0nC6PtS 3 hierarCh,Cal classiflcat,on person singular present", "past participle", "present parti- 

1 a r c r. >i 11/, r/r»w\ r>» . • • r 15 ciple". The value of this field for the word "drive", for 

A LE Domini Model (LE/DM) 530 contains informa- * , WQuld b ^ d[0ye) ^ rticj le driven))> 

lion that is required only by the LE 130; this includes . ,. .. .. , ,, . f /.^ ± . , ". 
non-CSL synonyms for CSL lexical items, dictionary indtcatmg that those two formsof the verbs are irregular and 
definitions of CSL lexical items, and examples of the a » other forms arc regular. Finally, he K/DM includes 
CSL lexical items in use typographical restrictions— e.g., if the lexical item must be 
The Kernel/DM 510 will contain one lexical entry for 20 in ca Pj^ *™ the first character capitalized etc 
every CSL lexical item (word or phrase). (A "lexical entry" , ™ C J ^J™ 520 ? n ' a,ns ' nfonnal ' on re r rcd ° nlv 
consistsof a lexical item-a word or phrase-and minimally the \ ®: This includes: selectional restrictions on con- 
its associated semantic concept and part of speech), for ^ and j^hical classification of concepts for organi- 
example, if the word "leak" is in CSL as both a noun and a Z3t ™ selectional restrictions, 
verb, it would have two lexical entries.) Each lexical item 25 u ^ 530 W,U °° n ' am ™ a : C! ? L V™"?™ to help 
will be dated with additional information required by the LE J c au [ hor * l ° C ^. V , ahd ^ leX,C f, ,temS " g n 
130 and/or the MT 140 such as a definition and irregular Kernel and the LE/DM W.U contain all information and all 
morphological variants. restrictions required to characterize the CSL lexicon in 
The shared K/DM 510 speeds up refinements and exten- ^fP?" ° f ^ e ^ Vocabu . lar y Checker (described below), 
sions of the CSL, saves duplication of effort in the authoring 30 contains additional information required only 
and translation components, and provides a human readable ^ y ' . ^ c ^ u i a 7 Ch «* ker - This includes: a dictionary 
structure to facilitate maintenance and extensions. definition-the definition of the word or phrase that will be 
The K/DM 510 is a lexicon containing both the syntactic P resemed f ™ lh ° r * b / . me , non - CSL . anonyms- 
and semantic information about terms (words and phrases) synonyms for the CSL lexical items that authors might use 
in the constrained language text. It is the central lexical 35 in writing documents, and a usage example-an example of 
knowledge source for the analysis side of the automated ^ word or phrase in a CSL sentence, for presentation to the 

machine translation (MT) process. The K/DM 510 is also aut Jl ors by ,he H? - . ■ ■ . 

used as the basis for the LE/DM. 7,16 P ur P ose of ""^"g this information in the LE/DM 

The K/DM 510 includes a separate entry for each term in is |° hel P the autho1 * e " sure thal th , cir wri,in 6 is c made U P of 

each syntactic category. (Thus, for a word like "truck," 40 valid CSL words and phrases. The dictionary defimtions and 

which is both a noun and a verb, there are two entries.) usa & e samples will help the authors ensure that they are 

K/DM entries contain the following information: wm &. a word or P hrase of a P art of s P eech and Wlth a 

root fe "truck"V meaning that is permitted in CSL; however, dictionary 

roo (.e.g., rue definitions or usage examples will not be required for every 

part ot speech (e.g., N); 45 CSL lexical item Rather> they wiU be requ i red only for the 

for content words, the concept or meaning (e.g., small percentage of ambiguous or vague terms whose CSL 

O-TRUCK); meaning will not be immediately clear to authors. This 

morphological information (e.g., irregular inflections); probably amounts to less than half of the lexical items in the 

syntactic information (e.g., whether a noun is count or DM. For example, function words like "for" and "the" will 

mass); so not require definitions or examples; many technical terms, 

definitional information: short definitions and textual especially those with very specific technical meanings, may 

examples documenting the different senses and uses of not require definitions or examples cither, 

the words, and a specification of the sense in which the itie non-CSL synonyms in the LE/DM will help authors 

word is to be used iu the constrained language, who write a non-CSL word or phrase to choose a synony- 

The DM 500 is defined in three sets of external human- 55 mous or related CSL word or phrase with which to replace 

readable files which can be read by the processes) that it. It is desirable for the vocabulary checker to provide 

require their use. Since the MT 120 and the LE 130 will be information about not only synonyms which are the same 

running in separate processes, the information in the model part of speech as the non-CSL word with which they are 

is represented internally in two forms: one for the parts of synonymous, but also about related words that might aid 

the DM required by the MT 120 and another for the part 60 authors in rewording sentences. If the latter arc included, the 

required by the LE 130. So the K/DM 510 is defined in a set LE/DM must contain information about these related words 

of files which can be represented in both forms; the LE/DM in addition to the mandatory content. 

530 is only represented in the form used by the LE 130; and E. Language Editor 

the MT/DM 520 is only represented in the form used by the Referring to FIG. 1(6), the constrained language editor 

MT 120. Described below arc the external file formats, the 65 (LE) 130 is a set of tools to support authors and editors in 

content of the various parts of the DM, and the internal creating documents within the bounds of CSL These tools 

representation of the information used by the LE 130. will help an author to use the appropriate CSL vocabulary 
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and grammar to write service documentation. The LE 130 is 
built as an "extension" of the SGML text editor 140, 
Although the LE 130 uses the same communication chan- 
nels as the SGML text editor 140, the functions of the two 
arc mutually exclusive. However, the user interface used to 
interact with the LE 130 is a "seamless extension" of the 
SGML text editor interface. 

The author 160 creates documents in the SGML text 
editor 140 and invokes the LE 130 it. The LE 130 informs 
the author whether individual words in a document are 
non-CSL, and will be able to suggest synonyms in CSL for 
words that are relevant to the user application information 
domain, but are not in CSL. In addition, the LE 130 tells the 
author whether or not the text in a file satisfies CSL syntactic 
constraints. 

The LE 130 software includes the following: a Vocabulary 
Checker, a Grammar Checker, including an interface 
through the MT Syntactic Analyzer, which will provide the 
core grammar checking functionality, and a User Interface 
(UI). In addition, the CSL vocabulary information used by 
the CSL LE will be represented in the K/DM and the 
LE/DM. 

The LE 130 will certify that all vocabulary and sentence 
structures in a document conform to the CSL specification. 
The LE 130 marks the document with an SGML tag that 
represents this CSL approval. Checking must be performed 
on all text in a document, which includes the following: 
sentences, headings, list items, captions, call-outs in 
graphics, and information in tables. 

Since the present invention is based on the premise that 30 
authors should be productive as possible during a CSL 
checking session, and that authors should not have to work 
multiple authoring documents at once, a batch mode of 
operation, which requires a user to submit a document for 
processing and wait until the entire document is finished 35 
before he or she gets any feedback, is not appropriate. The 
LE 130 provides an interactive mode of operation for both 
vocabulary checking, grammar checking, and interactive 
disambiguation. 

FIG. 6 shows a high level flow chart of the operation of 40 
the LE 130. The LE 130 takes in as input text 605, which 
may be ambiguous and unconstrained. The potentially 
ambiguous unconstrained input text 605 is first checked with 
a vocabulary checker 610 which performs its functions (as 
described below) with the aid of a spell checker 615. (Vhc 
services of the spell checker happen to be rendered in this 
embodiment by the spell checker regularly featured by the 
host TE 140.) Once the vocabulary checker 610 has com- 
pleted its check and made all necessary corrections (with the 
aid of the author) then the lexically constrained text 617 is 
supplied to a grammar checker 620. The grammar checker 
620 produces syntactically correct CSL text 625. The con- 
strained syntactically correct text 625 is then disambiguated, 
as shown in block 630. The result of the disambiguation is 
translatable unambiguous constrained text 635. The trans- 
latable text 635 can be translated into a foreign language 
without any pre-editing required. The accuracy of the result- 
ing translation also makes postediting unnecessary. 

1. Vocabulary Checker 

FIG. 7 shows a flow chart of the operation of vocabulary 
checker 610. line vocabulary checker 610 identifies words 
not known to be CSL. The vocabulary checker 610 identifies 
occurrences of non-CSL words, in an author's text, and 
helps an author find valid CSL replacements for non-CSL 
words. It recognizes word boundaries in a document and 
identifies every instance of a lexical item that is not known 
to be CSL. 



45 



50 



65 



As shown in block 706, the first term of a unit is selected 
to be checked. The term is then checked, as shown in block 
710, against a CSL lexical database (i.e., dictionary) which 
contains all CSL words. If the term is not found in the CSL 
dictionary, the term is then spell checked against a standard 
dictionary, as shown in block 722. If the word has been 
misspelled, the author is provided a means of correcting the 
spelling mistake (i.e., the vocabulary checker 610 displays 
spelling alternatives), as shown in block 726. 

The item is then checked to determine whether it is in the 
CSL vocabulary, as shown in block 734. If the item is in the 
CSL vocabulary, then the procedure advances to block 718. 
However, if the item is not in the CSL vocabulary, the 
system checks to see if the LE/DM contains a synonym for 
the item being checked, as shown in block 736, If at least one 
synonym exists in the LE/DM, the system displays the 
synonym(s) which are part of the CSL vocabulary and 
allows the author to make a selection, as shown in block 738. 
However, should the LE/DM not have a synonym for the 
item under checking, the author has the opportunity to 
rework her input, as shown in block 740. The outcome of 
this rework goes back to block 710. Once a legal selection 
has been made by the author, the procedure 700 then 
proceeds to block 718. 

When a non-CSL word is identified, the author has the 
following options: she can select an alternative and substi- 
tute it for the word in the document, or she can enter a new 
item and substitute it for the word in the document. 
Typically, the author selects one of the synonyms to replace 
the non-CSL item. If the author should decide to skip the 
problem, the lack of resolution would result in failure of the 
text to be approved as CSL. 

Block 718 checks to determine whether there are any 
more terms in the unit. If there are no more terms the 
procedure 700 stops. Otherwise the next term is selected, as 
shown in block 714, and the procedure 700 begins again 
from block 710. 

In particular, the Vocabulary checker 610 identifies every 
instance of a lexical item that is not known to be CSL. For 
each such word, the vocabulary checker 610 will determine 
which of the following descriptions is applicable and report 
supporting information to the user interface as listed below: 
a non-CSL word having known CSL synonyms; in this 
case the Vocabulary Checker 610 will identify the 
synonyms. For instance, let us assume that the word 
"let" is non-CSL — 
Author's Input, When Checked: Open the valve and let 

more nitrogen go to the accumulator. 
VC Message: The term is non-CSL, but there arc related 

CSL alternatives. 
CSL Alternatives: allow, allowed, enable, enabled, 

permit, permitted, leave, left 
CSL Sentence as Edited: Open the valve and allow more 

nitrogen to go to the accumulator, 
a word which may only appear in CSL as part of a phrase, 
but which is not used in a CSL phrase in the current 
context; in this case the Vocabulary Checker 610 will 
report acceptable CSL phrases containing the word — 
Author's Input, When Checked: The first time the valve 
lash is checked, the injector timing should be checked. 
VC Message: 'ITie term is used in a non-CSI, context. 
CSL Alternatives: advance signal timing, advance timing 

groove, timing gear, timing mechanism 
CSL Sentence as Edited: The first time the valve lash is 
checked, the injector timing mechanism should be 
checked. 
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a word or phrase which must appear within double The system is capable of being programmed for several 

quotation marks in CSL, but which is not enclosed in customer-specified parameters, 

quotation marks in the current context; in this case the would be told that "capable" [[capable]] was not a CSL 

Vocabulary Checker 610 will report that the term word - Although the word "can" [[can]] is CSL, neither the 

should be quoted — 5 word "capable" nor the phrase "is capable of* [["is capable 

Author's Input, When Checked: For more details, read the ? r D can be directly replaced with "can" without the need 

Testing and Adjusting article in the next section. for further changes to the sentence. 

k * . .. i 2. Grammar Checker 

VC Message: lliis term is generally enclosed by quotes. The purpose of the Grammar Checker is t0 identi f y places 

CSL Alternative: None where an author's text does not conform to CSL grammati- 

CSL Sentence as Edited: For more details, read the cal restrictions, and to focus the author's attention on those 

"Testing and Adjusting" article in the next section. places. The grammar checker 620 functionality will be 

a word or phrase which must appear with specific, man- Provided by the Analysis module 127 of the MT system 120, 

datory capitalization in CSL, but which lacks that extended to allow the system to report instances of syntactic 

capitalization in the current context (e.g., an acronym 15 and semantic ambiguity. The grammar checker interface 

presented in lower case); in this case the Vocabulary allows the aulhor t0 res P ond interactively to requests for 

Checker 610 will report the correct CSL form(s)- clarification of ambiguity. It is possible that a sentence can 

Author's Input, When Checked: Turn the screw until the be : a constrainedlanguage but that it may have more than one 

r * /"vi fr\ *\ interpretation. The grammar checker interface will present 

pressure gauge reads 0 kp a (0 psi). r . ,. c z . i_i r.i. 

00 r \ r / some indication of the two or more possible meanings or. the 

VC Message: The term is improperly capitalized. 20 sentence tQ tfae amhof and fequest clarification . ^ example 

CSL Alternative; kPa of an ambiguous sentence would be: "Check the cylinders 

CSL Sentence as Edited: Turn the screw until the pressure on the inside." Are the cylinders located on the inside or are 

gauge reads 0 kPa (0 psi). you supposed to check the inside of the cylinders? There are 

a non-word (that is, a group of letters representing a two kinds of possible ambiguities: 

misspelled word) that has known spelling alternatives; Lexical ambiguities. Lexical ambiguities occur where a 

in this case the Vocabulary Checker 610 will identify word can have one or more meanings in the constrained 

the spelling alternatives, regardless of whether the language. While it is a desirable that in the constrained 

result is in CSL (the user will resubmit the chosen language each word should have only one meaning per 

alternative for further checking) — ^ part of speech, there are some words which will have 

Author's Input, When Checked: When it is necessary to more than one meaning. For example, the word "gas" 

raise the boom, the boom must have correct support. can have the meaning "natural gas" or "gasoline." 

VC Message: The term is non-CSL. At lhe lcxical levd » too > lhe Problem may be caused by 

pc, A1i «■ one word which can be used in two different syntactic roles 

CSL Alternative: necessary . „ r ~ . , lic ,„ ... J . 

„ , „„ , „^ m CSL. Such is the case of fuel , which can be either a 

CSL Sentence as hdited: When it is necessary to raise the 35 nQun Qr a ^ {n ^ When ^ author g 

boom, the boom must have correct support. whefe ^ sya ^ fole fc Q0{ ^ ^ Grammar 

a word that is not in CSL and about which the system ( GC ) 62 o may prompt the author as follows. 

knows nothing Hie message for an unknown word or Amhor , 8 , Whcn Checked; ^ sensQr Ls attached to 

phrase gives the aulhor the opportunity to change the ^ 

wording altogether or shield the illegal expression from 40 rr u -n. * u a 

. , . • w u r ii * GC Message: The term may be used as a noun or as a 

checking, as the case may require. In the following ° ' 

example, the author uses an SGML tag to tell the A , ... " . t ... *• c j-.- , L 

' / i i .u «• • i j i At this point, the author has the option of editing the 

system to overlook the offensive language and leave it 4 ... 4 . , c ^ f / i_ * i_ i 

.^ iact & & sentence without help from the system (which simply 

45 requires rewriting and submitting again to the checker). If 

Author's Input, When Checked: Put approximately 0.9 L lhe autnor opts t0 request for help> the system may offer 

(1 quart) of SAE10W hydraulic oil in the nitrogen end specific inslru aions to deal with problems of the same type, 

of the accumulator. In this case lhe help is specific: 

VC Message: The term is unknown. Help! 

CSL Alternative: None 5U qq Message: If the word is a noun, you may want to use 

CSL Sentence as Edited: Put approximately 0.9 L (1 a determiner before it. If it is a verb, can a determiner 

quart) of <sic>SAE10W</sic> hydraulic oil in the nitro- after it help? Example: The ship sinks vs. Ship the 

gen end of accumulator. sinks. 

a punctuation mark or special symbol that is not allowed The aulhor then proceeds to edit the sentence and submits 

in CSL in any context. 55 it to the grammar checker 620 again. 

In cases where a non-CSL word has no direct CSL Structural ambiguity. Structural ambiguity occurs where 

synonyms (that is, words that could replace it directly in a words in a sentence may group together in more than 

document), the system can identify related CSL words or one way. For example: "Remove the valve with the 

phrases which an author could use to express the intended lever." Does the phrase "with the lever" from a unit 

idea. This functionality provides authors with additional 60 with the phrase "the valve," or docs it, instead, from a 

support in rewording a sentence to include only CSL unit with the verb "remove"? In other words, is this a 

vocabulary. However, changes to use these related words sentence about a valve that has a lever attached to it or 

could not be completed with the automatic replacement is it about using a lever to remove a valve? 

facility provided for synonyms, since the changes would In the I ATS 105, the component designed to answer this 

require some modifications to the sentence structure. For 65 question is the domain model 137, which is constructed in 

example, if "can" was in CSL and "capable" was not, an such a way as to minimize the occurrence of such ambigu- 

author who wrote the following sentence ities. 
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As shown in FIG. 5, the DM/MT 520, which supports 
exclusively the machine translation process, contains two 
types of information. On the one hand, the semantic infor- 
mation (A) supports the identification of relationships 
between concepts. On the other hand, the contextual infor- 
mation (B) specifies for a particular verb the so-called deep 
cases or arguments that such verb can take. In the example 
under consideration, let us consider first how the semantic 
information (A) and the contextual information (B) help the 
analyzer 127 determine the grammatical structure of 
"Remove the valve with the lever". 

Among many semantic relationships, there is a relation- 
ship "is a part of* which obtains, for instance, between the 
concept "hat" and the concept "costume", where the "hat" 
"is a part of the "costume". The same relationship obtains 
between the concept "sole" and the concept "shoe", "heel" 
and "shoe", etc. The semantic information (A) held in the 
DM/MT 520 identifies this and other semantic relationships 
between the concepts in the domain. 

When the process in the MT analyzer 127 goes to the 
DM/MT 520 for semantic information concerning the rela- 
tionship between the concept "valve" and the concept 
"lever". The information in the DM 137 will not enable the 
MT analyzer 127 to tell whether "lever" "is a part of 
"valve" — the knowledge about such rclatioaship is just not 
there. So the MT analyzer 127 is still at a loss as to whether 
the phrase "with the lever" should be attached to the word 
"valve". 

Now when the MT analyzer 127 turns to the contextual 
information (B), it finds that the verb "remove" takes three 
cases: a nominative (NOM), an accusative (ACQ, and an 
instrumental (INS) (at a deeper level of analysis, however, 
than that of the Latin grammar of our school days). That is, 
"remove" fits in the following case frame. 

...^(NOM.ACC, INS) 

Based on this abstract pattern, we can build sentences 
such as the following. 



NOM 


VERB 


ACC 


INS 


The workman 


removed 


the sand 


with a shovel 


Peter 


has removed 


the box 


with the nail 


etc. 









As the DM/MT contains information about the combina- 
tion of the preposition to "with" and nouns hav ing the 
semantic feature [+INSTRUMENT]. such combination 
form instrumental phrases. This information enables the 
analyzer to determine that 

a ) since "lever" is [+INSTRUMENT], "with the lever" is 
INS; 

b) since "remove" can take the INS case, the phrase "with 
the lever" attaches to, fits together with, and is inter- 
preted as modifying "remove". 

Yet the DM 137 can only be as rich as we build it. In those 
cases where the semantic information has not been devel- 
oped as fully as possible, the lexical entries in the domain 
may not be able to support the disambiguation process 
performed by the MT analyzer 127. 

Consider the case of "nail" in "Peter has removed the box 
with the nail". If the DM 137 contains the information about 
nails being part of a wooden frame but fail to contain the 
information that nails are [+INSTRUMENT], then the MT 
analyzer 137 cannot possibly determine whether "with" 
combines with "nail" to form an instrumental phrase. The 
analyzer being unable to resolve the structural ambiguity, the 
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author will be asked to resolve it. When the text submitted 
by the author undergoes grammar checking, the following 
interaction occurs. 
Author's Input, When Checked: Peter has removed the 
5 box with the nail. 

grammar checker 620 Message: The sentence is ambigu- 
ous. 

1. Is the nail an instrument? 

2. Does the "box" have a "nail"? 

10 Once the author makes an interpretation choice, the 
checker attaches an invisible SGML tag to the sentence, 
which indicates to the system how the sentence should be 
translated. 

As mentioned above, the MT analyzer 127 is called by the 

15 grammar checker in order to check whether input text or an 
IE (or part thereof) conforms to the grammatical and seman- 
tic constraints of CSL. In this regard, a preferred embodi- 
ment returns a strict "green-fight, red-light" message for 
each sentence, the latter indicating that the author must 

20 correct the composition of the flagged sentences via the 
authoring environment. Once the entire input text or IE has 
been certified as CSL compliant it may be stored away or 
sent for immediate translation. 

Referring to FIG. 8, a high level flow chart of the 

25 grammar checker 620 (syntactical analysis) and disambigu- 
ation checker 630 (semantic analysis) is shown. The word 
"sentence" is used below to refer to the unit of text that 
passes or fails the checking by the analysis module 127. The 
unit that is checked may actually be a non-sentential text 

30 component such as a heading, title, or list element, or a 
caption or other text from a graphic. The grammar checker 
620 recognizes sentence boundaries and SGML element 
boundaries in an SGML marked-up text. It identifies every 
sentence that does not conform to the CSL specification. 

35 This will include every sentence which cannot be success- 
fully parsed by the MT Analysis module 127. The parsing 
may fail for reasons including but not limited to those listed 
below. 

The sentence includes grammatical constructions which 
40 the analysis module 127 will not parse. Such is the case, 
for instance, when the sentence contains a reduced 
relative clause. The reduction results from deleting the 
relative pronoun "that" and the verb "be" in a sentence 
like "Don't change the values that are programmed into 
the unit". 

Author's Input, When Checked: Don't change the values 

programmed into the unit, 
grammar checker Message: This sentence is difficult to 

so para- 
Please check for one of the following problems: 
Then the grammar checker 620 goes on to list the typical and 
most frequent situations where parsing is made difficult if 
not impossible through the use of grammatical constructions 
55 not included in the repertoire of CSL. 

The punctuation usage in the sentence does not conform 
to CSL restrictions. As noted above, punctuation marks 
and special characters which are not part of CSL in any 
context will be flagged by the Vocabulary Checker 610. 
co However, the Vocabulary Checker 610 does not parse 
input, so it will not report cases in which such an 
clement exists in CSL but has been used in the wrong 
context. This kind of case will trigger a "fail" response 
from the Grammar Checker 620. 
65 A CSL vocabulary word was used in a syntactic form that 
is not recognized for that word in CSL. The Vocabulary 
Checker 610 will flag some of these cases; for example, 
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if the word test is included in CSL as a noun but not as 
a verb, the Vocabulary Checker will report that the past 
form tested is not CSL. However, the Vocabulary 
Checker 610 will allow the present verb form tests to 
pass, since that form is identical to the plural CSL noun 
tests. This case will trigger a "fail" response from the 
Grammar Checker 620. 
'llie Grammar Checker 620 uses the MT Analysis module 
127 (and the domain model 137) to identify sentences that 
do not conform to CSL grammatical constraints, this is 
known as syntactical analysis and is shown in block 805. For 
each such sentence, the Grammar Checker 620 reports that 
the sentence is not CSL. It is also possible for a sentence to 
be in CSL but be ambiguous. Consequently, the present 
invention provides semantic analysis as shown in block 710. 
If the sentence being checked is not semantically 
ambiguous, the disambiguation checker 630 will present 
some indication of the two or more possible meanings to the 
author and request clarification, as shown in blocks 815 and 



interlingua (which is well known in the art). Interlingua is in 
turn represented in a frame notation and thus can be viewed 
as a kind of semantic network. Like other artificial or formal 
languages, interlingua has its own lexicon and syntax. The 
5 lexicon is based on the domain from which the translated 
texts are taken (e.g., computer maintenance, space 
exploration, etc.). Thus, interlingua "nouns" are "object 
concepts" in the ontology; interlingua verbs correspond, 
roughly, to "events" in the ontology; and interlingua adjec- 
10 lives and adverbs are the various "properties" defined in the 
ontology. The ontology forms a densely connected network 
for the various types of concepts, called the domain model. 

Referring to FIG. 3 and FIG. 9, the Machine Translation 
(MT) component 120 of the IATS 105 contains two main 
15 sections. The first, the CSL analyzer 127, performs the first 
processing stage of representing CSL text in interlingua. The 
second main section, the Target Language Generator 123, 
translates the interlingua representation of the "CSL- 
approved" texts into a target language (e.g., French, 



825. In a preferred embodiment, when a sentence fails the 20 Japanese, Spanish). In performing both tasks, the MT corn- 



Grammar Checker 620 and/or the disambiguation checker 
630, the author has the following options: edit the document, 
in cases of an ambiguous reading, disambiguate the 
sentence, recheck the same input, or continue checking 
without editing. 

Note that the present invention implements absolute 
adherence to constraints of vocabulary and grammar, rather 
than just stylistic warnings or simple error detection (such as 
subject- verb agreement). 



poncnt 120 runs as one or more independent server modules, 
accepting translation requests from a human translation 
controller (not shown). During target language generation, 
target language generator 123 maps the Interlingua text 260 
25 into the appropriate units of target language syntax to 
produce high-quality output text 950 that requires no poste- 
diting. 

Once the MT analysis module 127 has produced Interlin- 
gua text 260 for a certified CSL-compliant IE, that interlin- 



If the sentence is semantically unambiguous, then it is 30 gua may be stored away, delivered, or converted immedi- 
translated into Interlingua, as shown in block 720. Once the ately into a target language IE, or into an IE in each of 
document passes the grammar checker 620, a SGML tag several target languages by the generator 123 (which 
designating CSL approval can be inserted in the document. includes a semantics-to-syntax mapper and a Generation Kit 
In a preferred embodiment, the Grammar Checker 620 (Tomita M. and E. Nyberg, The Generation Kit and Trans- 
provides pass/fail feedback to the author 160. However, 35 formation Version 3.2 User's Manual, Technical Memo 



more specific feedback other than pass/fail feedback can be 
implemented. 

For a more in depth discussion of grammar checking, 
including disambiguation, see Tomita, M., "Sentence Dis- 
ambiguation by Asking," Computers and Translation, 
1:39-51 (1986) and Carbonell, J. and M. Tomita, 
"Knowledge-Based Machine Translation, the CMU 
Approach," in S. Nirenburg (ed.), Machine Translation: 
Theoretical and Methodological Issues, Cambridge: Cam- 



(1988), available from the Center for Machine Translation, 
Carnegie Mellon University, Pittsburgh, Pa.) sentence gen- 
erator proper). MT analyzer 127 and MT generator 123 
interact in two ways. First the output of the former is the 
40 input to the latter, and second they share some external 
knowledge sources, especially the domain model 137. 

The MT system 120 is subdivided, as shown in FIG. 9. 
Analysis consists of a Parser 910 and an Interpreter 920. The 
other half of the MT 120 can be divided into a Mapper 930 



bridge University Press, pgs. 68-89 (1987) both of which 45 and a Generator 940. The oval circles in FIG. 9 stand for the 



are incorporated by reference. 
F. Machine Translation 

The MT 120 is an interlingua- type machine translation 
system. In such systems, the constrained source language 
(CSL) and the target language never come in direct contact. 
The processing in such systems generally occurs in two 
stages. First, representing the meaning of the CSL text in a 
language-independent formal language, called interlingua, 
and second, expressing this meaning using the lexical units 
and syntactic constructions of the target language. 

Interlingua MT systems, as well as other types of MT 
systems are well known in the art. Detailed descriptions of 
these different approaches to machine translation can be 
found in Hutchins, Machine Translation: Past. Present. 



data that is produced and passed between the major software 
modules. 

The DM 137 (and specifically the MT/DM 520) is used in 
three different ways during translation: (1) the parser 910 
so uses the DM 137 to constrain possible attachments (using 
strict subcategorization of arguments and modifiers during 
syntactic parsing); (2) the interpreter 920 uses the DM 137 
to instantiate the appropriate domain concepts during inter- 
pretation; (3) the mapper 930 uses the DM 137 to select the 
55 appropriate target realization for each interlingua concept. 
'1 rie MT 120 runs as one or more server processes. Each 
such MT process accepts translation requests from the FMS 
110 and returns the results. The requests contain SGML- 
tagged CSL text and the results contain SGML-tagged target 



Future, Ellis Horwood, Ltd., Chichester, UK, 1986, and 60 language translations. Since translations into more than one 



Zarechnak, The History of Machine Translation, in Henisz- 
Dostert, McDonald, Zarechnak, eds., Machine Translation. 
Trends in Linguistics: Studies and Monographs, Vol. 1 1 , The 
Hague, Mouton, 1979, both of which are herein incorporated 
by reference in their entirety. 

The meaning of the CSL text 350 is represented in the 
specially designed knowledge representation scheme called 



language may be going on at once, the requests also include 
desired target language. Since the MT server processes are 
specialized by target language, a routing function is 
involved. This routing function is performed automatically 
65 by the FMS 110. The precise set of MT processes running at 
a given time and their distribution across machines is 
determined by the FMS 110, which will modify the mix 
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according to the set of translation jobs outstanding at any In other words, when dealing with multiple languages the 

particular time. linguistic structure is no longer a universal invariant that 

Referring to FIG. 9, the CSL Analyzer 127 consists of two transfers across all applications (as it was for pure English 

interconnected components— a syntactic parser 910 and a language parsers), but rather is another dimension of param- 

semamic interpreter 920. Semantic interpreter 920 is also 5 elerizalion and extensibility. However, semantic information 

known in the art as a mapping rule interpreter. The . . . . ' , AUa . Af „ 

. t . ftin • Tu %ct * . me • . j can remain invariant across languages (though, of course, 

syntactic parser 910 obtains the CSL text 305 input and , . . __ e r. 6 > . f 

produces a syntactic structure for it. The syntactic parser 910 not across domains). Therefore, it is crucial to keep semantic 

uses an LFG-type grammar. Lexical Functional Grammar knowledge sources separate from syntactic ones, so that if 

(LFG) is a formalized grammar which is well known in the new linguistic information is added it will apply across all 

art of machine translation. As a result, the resultant syntactic semantic domains, and if new semantic information is added 

structure is an LFG f-structure 960. As soon as the it will apply to all relevant languages. The universal parser 

f-structure for the CSL sentence 960 is created, the semantic attempts to accomplish this factoring without making major 

interpreter 920 starts applying mapping rules in order to concessions to either run-time efficiency or semantic accu- 

substitute source language lexical units and syntactic con- racv 

structions with their uiterlingua translations Uxical units " ^ , 10 ^ charactcrizcd b three kinds of knowl . 

map into instancesofdomain concepts (e.g., the word data . 1 _ . . ... .. ,._ 

will map into the interlingua "information"), while syntactic f d S e sources ' 0ne conlams s y nlactic for different 

structures map into conceptual relations (e.g., subjects of languages, another contains semantic knowledge bases for 

sentences often map into the "agent" relations in different domains, and the third contains sets of rules which 

interlingua). See Mitamura, The Hierarchical Organization 20 map syntactic forms (words and phases) into the semantic 

of Predicate Frames for Interpretive Mapping in Natural knowledge structure. Each of the syntactic grammars is 

Language Processing, Center for Machine Translation, Car- completely independent of any specific domain; likewise, 

negie Mellon University (May 1990) which is incorporated each of the semantic knowledge basis is independent of any 

by reference. specific domain; likewise, each of the semantic knowledge 

The MT analyzer 127, guided by analysis knowledge 25 bases is independent of any specific language, 

(data files), translates a CSL text 305 input sentence in the Further, the mapping rules arc both language- and 

source language into a semantic frame representation of the domain-dependent, and a different set of mapping rules is 

meaning of the sentence. The knowledge structures brought created for each language/domain combination. Syntactic 

to bear in the analysis phase are the analysis grammars, the grammars, domain knowledge bases, and mapping rules are 

mapping rules, and the concept lexicon. 30 written in a highly abstract, human-readable manner. This 

The first part of the analysis is the parsing process, driven organization makes them easy to extend or modify, but 

by the syntactic analysis of the input sentence. The parser possibly machine-inefficient for a run -time parser. 

910 uses the semantic restrictions embodied in the concept The function of the mapping rule interpreter 920 is to 

lexicon (domain model) to guide its treatment of syntactic generate and manipulate the syntactic and semantic struc- 

ambiguities encountered in its analysis of the input. The 35 tures of a parse and, moreover, to generate these structures 

mapping rules mediate between the syntactic analysis gram- simultaneously. 

mars and the concept lexicon. The universal parser 910 produces all the possible, that is, 
The output of this analysis is syntactic f-structures con- valid, f-structures that can be derived from the sentences 
taining all applicable semantic information. This structure parsed. Each of these syntactic f-structurcs has semantic 
can be further processed by the second part of the MT 40 features, in accordance with LFG-theory these features are 
analyzer 127 to produce a semantical ly-organized frame created at the same time as the rest of the syntactic 
representation, in the form of the instantiation of the relevant f-structure. The semantic component may thus be regarded 
concepts from the concept lexicon that were encountered in as an additional feature of f-structures. 
parsing the sentence. The MT analyzer 127 arrives at this Thus the semantic component is a "visible" part of the 
form by retrieving the f-structure's semantic features; these 45 syntactic parse. The approach, of simultaneously creating 
features contain all relevant semantic information. the syntactic and semantic structures, has produced a system 
The syntactic parser 910 used in the present invention is able to eliminate "meaningless" partial parses before corn- 
well known in the art and is described in detail in Tomita and pleting them. Semantics are added to the syntactic structure 
Carbonell, The Universal Parser Architecture for when the lexicon is accessed for the definition of a word. 
Knowledge-Based Machine Translation, Technical Report, so Another part of the definition of a word is a set of structural 
Center for Machine Translation, Carnegie Mellon University mapping rules. These mapping rules are used when syntactic 
(May 1987) Tomita (ed.) et al., The Generalized LR Parser/ equations in grammar rules add infirmation to a syntactic 
Compiler Version 8.1: User's Guide, Technical Memo, Cen- structure. 

ter for Machine Translation, Carnegie Mellon University The text language generator component 123 takes inter- 

( April 1988) which are incorporated by reference. 55 lingua text 260 as its input and produces a target language 

One of the advantages of interlingua translation systems text 950 as its output. l*he target language generator 123 

over other types of MT systems is that the interlingua 260 is consist of two major modules, one semantic and one syn- 

language independent; that is, the subject and target lan- tactic. The semantic per forms the function of target language 

guages are never in direct contact. This allows the construe- lexical selection and choice of target language syntactic 

tionofa machine translation system in which potentially any <K) constructions; it is aided in these tasks by the generation 

source and target languages could be selected while requir- lexicon and the generation structure mapping rules, respec- 

ing minimal modifications to the computational structure. tively. The output of this module is an f-structure of the 

Clearly, then, any such system will need to be able to parse target language sentence that will be output by the system, 

numerous source languages. Hence, a universal parser is The goal of the generation module is to produce target 

needed which will take a language grammar as input, rather 65 language sentences from the interlingua text 260 frames 

than build the grammar into the interpreter proper. 'ITiis produced by the CSL analyzer 127. 'ITiere are three main 

allows greater extensibility and generality. steps in generation: 
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1. Lexical Selection. 

For each concept in the interlingua, the most appropri- 
ate lexical item must be selected. 

2. F-Structure Creation. 

A syntactic functional structure which determines the 
grammatical structure of the target utterance must be 
produced from the 1LT frames. 

3. Syntactic Generation. 

The syntactic functional structure is processed by the 
generation grammar to produce a target language 
sentence. 

The design of the generation module 940 combines recent 
research in the area of lexical selection with a map-and- 
generate paradigm that has been utilized in previous trans- 
lation systems. 

For a more in depth discussion of machine translation and 
the specific design and operation of the modules described 
above see Nirenburg et al., Machine Translation: A 
Knowledge-BasedApproach, Morgan Kaufmann Publishers, 
Inc. (1992), Sommers & Hutchins, Introduction to Machine 



10 



Translation, Academic Press, London (October 1991), Mita- 20 development comprising 



a kernel which contains lexical information that is 
required by said language editor and said machine 
translation system, wherein said lexical information 
includes lexical items within said natural language 
subset along with associated semantic concepts, 
parts of speech, and morphological information, 

a language editor domain model which contains infor- 
mation that is required only by said language editor, 
wherein said information includes at least one of a 
natural language subset of synonyms for items not 
within said natural language subset, a dictionary of 
definitions of said lexical items, and a set of 
examples of using said lexical items, and 

a machine translation domain model which contains 
information which is required by only said machine 
translation system, said machine translation domain 
model includes a hierarchy of concepts used for 
unambiguous mapping and semantic verification in 
translation. 

2. A computer-based system for monolingual document 



mura et al., An Efficient Interlingua Translation System for 
Multi-lingual Document Production, Proceedings of 
Machine Translation Summit 111, Washington D.C (Jul. 2-4, 
1991), Nirenburg, S., "World Knowledge and Text 
Meaning", in K. Goodman and S. Nirenburg (eds.), The 25 
KBMT Project: A Case Study in Knowledge-Based Machine 
Translation, San Mateo, Calif.: Morgan Kaufmann, KBMT- 
89 Project Report available from the Center for Machine 
Translation, Carnegie 5 Mellon University, Pittsburgh, Pa. 
(phone number (412) 268-6591) (4th Printing: March 1990), 30 
S. Nirenburg (ed.), Machine Translation: Tlxeoretical and 
Methodological Issues, Cambridge: Cambridge University 
Press, pgs. 68-89 (1987), and Carbonell el al., Steps Toward 
Knowledge-Based Machine Translation, IEEE Transaction 
on Pattern Analysis and Machine Intelligence, Vol. PAMI-3, 
No. 4 (July 1981) which are all hereby incorporated by 
reference. 

While the invention has been particularly shown and 
described with reference to preferred embodiments thereof, 
it will be understood by those skilled in the art that various 
changes in form and details may be made therein without 
departing from the spirit and scope of the invention. 

What is claimed is: 

1. A computer-based system for monolingual document 
development, comprising: 

a text editor adapted to accept interactively from an author 
input text written in a source language; 

a language editor, which is an extension of said text editor, 
which interactively enforces lexical constraints and 
grammatical constraints on a natural language subset 
used by said author to create said input text, wherein 
said author is interactively aided in enforcing said 
lexical constraints and said grammmatical constraints 
on said input text so as to produce unambiguous 
constrained text; 

a machine translation system, responsive to said language 
editor that is configured to translate said unambiguous 
constrained text into a foreign language; and 

a domain model, which communicates with said language 
editor, wherein said domain model provides pre- 
determined domain knowledge and linguistic semantic 
knowledge about lexical units and of their 
combinations, so as to assist said language editor in 
said enforcement of said lexical and grammatical con- 
straints wherein said domain model is a tripartite 
domain model, said tripartite domain model 
comprising, 
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a text editor adapted to accept interactively from a author 
information elements written in a source language; 

a language editor, which is an extension of said text editor, 
which interactively enforces lexical and grammatical 
constraints on a natural language subset used by said 
author to create unambiguous constrained information 
elements, wherein said author interactively aids in 
enforcing said lexical and grammatical constraints on 
said input text so as to produce said unambiguous 
constrained information elements; 

memory means for storing said unambiguous constrained 
information elements for subsequent use; 

a machine translation system, responsive to said language 
editor, that is configured to translate said unambiguous 
constrained information elements into a foreign lan- 
guage; and 

a domain model, which communicates with said language 
editor, wherein said domain model provides pre- 
determined domain knowledge and linguistic semantic 
knowledge about lexical units and of their 
combinations, so as to assist said language editor in 
said enforcement of said lexical and grammatical con- 
straints wherein said domain model is a tripartite 
domain model, said tripartite domain model 
comprising, 

a kernel which contains lexical information that is 
required by said language editor and said machine 
translation system, wherein said lexical information 
includes lexical items within said natural language 
subset along with associated semantic concepts, 
parts of speech, and morphological information, 

a language editor domain model which contains infor- 
mation that is required only by said language editor, 
wherein said information includes at least one of a 
natural language subset of synonyms for items not 
within said natural language subset, a dictionary of 
definitions of said lexical items, and a set of 
examples of using said lexical items, and 

a machine translation domain model which contains 
information which is required by only said machine 
translation system, said machine translation domain 
model includes a hierarchy of concepts used for 
unambiguous mapping and semantic verification in 
translation. 

3. A computer-based method for monolingual document 
development, comprising the steps of: 
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(1) entering input text in a source language into a text 
editor 

(2) checking, via a language editor, said input text against 
a pre -determined set of constraints stored in a domain 
model that provides pre-determined domain knowledge 
and linguistic semantic knowledge about lexical units 
and of their combinations, said pre-determined set of 
constraints includes a set of source sublanguage rules 
concerning vocabulary and grammar, wherein said 
domain model is a tripartite domain model, said tripar- 
tite domain model comprising, 
a kernel which contains lexical information that is 

required by said language editor and a machine 
translation system, wherein said lexical information 
includes lexical items that satisfy said pre- 
determined set of constraints along with associated 
semantic concepts, parts of speech, and morphologi- 
cal information, 

a language editor domain model which contains infor- 
mation that is required only by said language editor, 
wherein said information includes at least one of a 
subset of synonyms for items that do not satisfy said 
pre-determined set of constraints, a dictionary defi- 
nitions of said lexical items, and a set of examples of 
using said lexical items, and 

a machine translation domain model which contains 
information which is required by only said machine 
translation system, said machine translation domain 
model includes a hierarchy of concepts used for 
unambiguous mapping and semantic verification in 
translation; 

(3) providing to an author interactive feedback relating to 
said input text, said interactive feedback indicating if 
said pre-determined set of constrain Ls is met, said 
interactive feedback is performed subsequent to refer- 
ring to said domain model which provides the neces- 
sary domain knowledge and linguistic semantic knowl- 
edge about lexical units and of their combinations, and 
grammar of a subset of a natural language; and 

(4) producing, after completion of step (3), unambiguous 
constrained text. 

4. A computer-based method of claim 3, wherein said 
pre-determined set of constraints includes a set of source 
sublanguage rules concerning vocabulary and grammar, 
wherein said interactive feedback is performed in order to 
make said input text conform with said set of source 
sublanguage rules and to eliminate ambiguities. 

5. A computer-based method for monolingual document 
development, comprising the steps of: 

(1) entering input text in a source language into a text 
editor; 

(2) checking, via a language editor, said input text against 
a constrained source language; 

(3) providing to an author interactive feedback relating to 55 
said source input text if non-constrained source lan- 
guage is present in said source input text until said 
author modifies said source input text into a constrained 
source text, said interactive feedback is performed after 
consulting a domain model which provides the ncccs- tio 
sary domain knowledge and linguistic semantic knowl- 
edge about lexical units and of their combinations, 
wherein said domain model is a tripartite domain 
model, said tripartite domain model comprising, 
a kernel which contains lexical information that is 

required by said language editor and said a machine 
translation system, wherein said lexical information 



includes lexical items within said constrained source 
language along with associated semantic concepts, 
parts of speech, and morphological information, 
a language editor domain model which contains infor- 
mation that is required only by said language editor, 
wherein said information includes at least one of a 
natural language subset of synonyms for items not 
within said constrained source language, a dictionary 
definitions of said lexical items, and a set of 
examples of using said lexical items, and 
a machine translation domain model which contains 
information which is required by only said machine 
translation system, said machine translation domain 
model includes a hierarchy of concepts used for 
unambiguous mapping and semantic verification in 
translation; 

(4) checking for syntactic grammatical errors and seman- 
tic ambiguities in said constrained source text by con- 
sulting said domain model; and 

(5) providing to said author interactive feedback to 
remove said syntactic grammatical errors and said 
semantic ambiguities in said constrained source text to 
produce unambiguous constrained text. 

6. A computer-based method for monolingual document 
25 development, comprising the steps of: 

(1) entering into a text editor at least one information 
element created in a source language; 

(2) checking, via a language editor, said at least one 
information element against a constrained source lan- 
guage; 

(3) providing to an author interactive feedback relating to 
said at least one information element if non-constrained 
source language is present in said at least one infor- 
mation element until said at least one information 
clement has been modified into a constrained source 
text, said interactive feedback is performed after refer- 
ring to a domain model which provides the necessary 
domain knowledge and linguistic semantic knowledge 
about lexical units and their combinations, wherein said 
domain model is a tripartite domain model, said tripar- 
tite domain model comprising: 
a kernel which contains lexical information that is 

required by said language editor and said a machine 
translation system, wherein said lexical information 
includes lexical items within said constrained source 
language along with associated semantic concepts, 
parts of speech, and morphological information, 
a language editor domain model which contains infor- 
mation that is required only by said language editor, 
wherein said information includes at least one of a 
natural language subset synonyms for items not 
within said constrained source language, a dictionary 
of definitions of said lexical items, and a set of 
examples of using said lexical items, and 
a machine translation domain model which contains 
information which is required by only said machine 
translation system, said machine translation domain 
model includes a hierarchy of concepts used for 
unambiguous mapping and semantic verification in 
translation; 

(4) checking for syntactic grammatical errors and seman- 
tic ambiguities in said constrained source text by con- 
sulting said domain model; 

(5) providing interactive feedback to said author to 
remove said syntactic grammatical errors and said 
semantic ambiguities in said constrained source text to 
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produce at least one unambiguous constrained infor- 
mation element; and 
(6) saving said at least one unambiguous constrained 
information element for later use. 

7. A computer-based system for translating source lan- 
guage input text to a foreign language, comprising: 

a text editor adapted to accept interactively from an author 
the input text written in a source language; 

a language editor, which is an extension of said text editor, 
which interacts with said author to produce from said 
input text an unambiguous constrained source text by 
interactively enforcing vocabulary and grammatical 
constraints against a constrained source language; 

a machine translation system, responsive to said language 
editor, which is configured to translate said unambigu- 
ous constrained source text into the foreign language; 
and 

a domain model, which communicates with said language 
editor and said machine translation system, and which 
provides predetermined domain knowledge and lin- 
guistic semantic knowledge about lexical units and of 
their combinations, so as to aid in producing said 
unambiguous constrained source text and in said trans- 
lation to the foreign language, wherein said domain 
model is a tripartite domain model, said tripartite 
domain model comprising, 

a kernel which contains lexical informationthat is 
required by said language editor and said machine 
translation system, wherein said lexical information 30 
includes lexical items within said constrained source 
language along with associated semantic concepts, 
parts of speech, and morphological information, 

a language editor domain model which contains infor- 
mation that is required only by said language editor, 35 
wherein said information includes al least one of a 
subset of synonyms for items not within said con- 
strained source language, a dictionary definitions of 
said lexical items, and a set of examples of using said 
lexical items, and 40 

a machine translation domain model which contains 
information which is required by only said machine 
translation system, said machine translation domain 
model includes a hierarchy of concepts used for 
unambiguous mapping and semantic verification in 45 
translation. 

8. The system of claim 7, further comprising means for 
marking with a tag a portion of said input text which has 
been rendered unambiguous constrained text by said inter 
active enforcement, wherein said tag indicates translatabil 

ity- 

9. The system of claim 7, wherein said machine transla- 
tion system operates in a translation server environment 
which allows multiple authors to use the system. 

10. The system of claim 7, wherein said author operates 55 
on a workstation which is part of a computer network. 

11. The system of claim 7, wherein said machine trans- 
lation system includes an interpreter which is configured to 
translate said unambiguous constrained source text into 
intcrlingua. 60 

12. 'Ihe system of claim 7, wherein said language editor 
provides said interaction with said author in a batch mode. 

13. The system of claim 7, further comprising a graphics 
editor adapted to create text labels, wherein said text labels 
can be edited by said author with the aid of said language 
editor and subsequently translated by said machine transla- 
tion system. 
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14. The system of claim 7, wherein said constrained 
source language is a subset of a natural language, said 
constrained source language is specified as to lexicon and 
grammar. 

15. The system of claim 7, wherein said language editor 
comprises a vocabulary checker and a grammar checker. 

16. The system of claim 15, wherein said vocabulary 
checker checks said input text against a permitted lexicon 
and suggests alternatives to non-lexicon word choices. 

17. The system of claim 15, wherein said grammar 
checker checks for compliance with predefined grammatical 
rules and suggests alternatives to undefined grammatical 
structures. 

18. The system of claim 15, wherein said grammar 
checker provides feedback to the author concerning lexical 
ambiguities and structural ambiguities. 

19. The system of claim 15, wherein said grammar 
checker provides a means for interactive disambiguation. 

20. The system of claim 15, wherein said vocabulary 
checker includes a spell checker. 

21. The system of claim 15, wherein said vocabulary 
checker is configured to identify words not included in said 
constrained source language. 

22. The system of claim 7, wherein said input text is 
provided in blocks of information elements. 

23. The system of claim 22, wherein said information 
elements contain tags which enable said information ele- 
ments to be described in terms of their content and logical 
structure. 

24. A computer-based system for monolingual document 
development and multilingual translation, comprising: 

a text editor adapted for accepting interactively from an 
author information elements written in a source lan- 
guage; 

a language editor, which is an extension of said text editor, 
which interactively enforces lexical and grammatical 
constraints on a natural language subset used by said 
author to create said input text, wherein said author is 
interactively aided in enforcing said lexical and gram- 
matical constraints on said information elements to 
produce said unambiguous constrained information 
elements; 

machine translation system, responsive to said language 
editor, which translates said unambiguous constrained 
information elements into a foreign language; and 
a domain model, which communicates with said language 
editor and said machine translation means, wherein 
said domain model provides pre-determined domain 
knowledge and linguistic semantic knowledge about 
lexical units and their combinations, so as to aid in 
producing said unambiguous constrained source text 
and in said translation to said foreign language, 
wherein said domain model is a tripartite domain 
model, said tripartite domain model comprising, 
a kernel which contains lexical information that is 
required by said language editor and said a machine 
translation system, wherein said lexical information 
includes lexical items within said natural language 
subset along with associated semantic concepts, 
parts of speech, and morphological information, 
a language editor domain model which contains infor- 
mation that is required only by said language editor, 
wherein said information includes at least one of a 
natural language subset of synonyms for items not 
within said natural language subset, a dictionary 
definitions of said lexical items, and a set of 
examples of using said lexical items, and 
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a machine translation domain model which contains 
information which is required by only said machine 
translation system, said machine translation domain 
model includes a hierarchy of concepts used for 
unambiguous mapping and semantic verification in 
translation. 

25. A computer-based system for monolingual document 
development and multilingual translation, comprising: 

(A) a text editor adapted to accept interactively from an 
author input text written in a source language; 

(B) a language editor, which is an extension of said text 
editor, which interactively enforces lexical and gram- 
matical constraints on a natural language subset used 
by said author to create said input text, said language 
editor comprising, 

(i) a vocabulary checker which identifies occurrences 
of words that do not conform to said lexical con- 
straints and which interactively aids said author in 
finding valid lexical replacements for said words that 
do not conform, and 

(ii) a grammar checker which provides interactive 
feedback to said author concerning syntactic and 
semantic ambiguity, said interactive feedback pro- 
ducing unambiguous constrained text; and 

(C) a domain model which communicates with said 
language editor, wherein said domain model provides 
pre-determined domain knowledge and linguistic 
semantic knowledge about lexical units and their com- 
binations; and 

(D) a machine translation system, responsive to said 
language editor, which is configured to translate said 
unambiguous constrained text into a foreign language,; 

wherein said domain model is a tripartite domain model, 
said tripartite domain model comprising, 
a kernel which contains lexical information that is 
required by said language editor and said a machine 
translation system, wherein said lexical information 
includes lexical items within said natural language 
subset along with associated semantic concepts, 
parts of speech, and morphological' information, 
a language editor domain model which contains infor- 
mation that is required only by said language editor, 
wherein said information includes at least one of a 
natural language subset of synonyms for items not 
within said natural language subset, a dictionary 
definitions of said lexical items, and a set of 
examples of using said lexical items, and 
a machine translation domain model which contains 
information which is required by only said machine 
translation system, said machine translation domain 
model includes a hierarchy of concepts used for 
unambiguous mapping and semantic verification in 
translation. 

26. A computer-based method for translating source lan- 
guage text to a foreign language, comprising the steps of: 

(1) entering input text in a source language into a text 
editor; 

(2) checking, via a language editor, said input text against 
a constrained source language; 

(3) providing to an author interactive feedback relating to 
said source input text if nonconstrained source lan- 
guage is present in said source input text until said 
author modifies said source input text into a constrained 
source text, said interactive feedback includes allowing 
said author to select, from a list of at least one 
synonym, a word or phrase to replace said noncon- 
strained source language; 
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(4) checking for syntactic grammatical errors and seman- 
tic ambiguities in said constrained source text; 

(5) providing interactive feedback to said author to 
remove said syntactic grammatical errors and said 
semantic ambiguities in said constrained source text to 
produce unambiguous constrained source text; and 

(6) translating, via a machine translation system, said 
unambiguous constrained source text into a target lan- 
guage; 

wherein steps (2) and (4) further include the step of 
communicating with a tripartite domain model (DM), 
wherein said tripartite DM provides predetermined 
domain knowledge and linguistic semantic knowledge 
about lexical units and their combinations, said tripar- 
tite domain model including, 

a kernel which contains lexical information that is 
required by said language editor and said a machine 
translation system, wherein said lexical information 
includes lexical items within said constrained source 
language along with associated semantic concepts, 
parts of speech, and morphological information, 

a language editor domain model which contains infor- 
mation that is required only by said language editor, 
wherein said information includes at least one of a 
set of synonyms for items not within said constrained 
source language, a dictionary of definitions of said 
lexical items, and a set of examples of using said 
lexical items, and 

a machine translation domain model which contains 
information which is required by only said machine 
translation system, said machine translation domain 
model includes a hierarchy of concepts used for 
unambiguous mapping and semantic verification in 
translation. 

27. The system of claim 26, further comprising the step of 
marking with a tag a portion of said input text which has 
been rendered unambiguous constrained source text, 
wherein said tag indicates translatability. 

28. The method of claim 26, wherein said step of trans- 
lating first includes the step of translating said constrained 
unambiguous text into interlingua. 

29. The method of claim 26, wherein said step (2) of 
check ing comprises the steps of: 

(a) checking a term from said source input text against a 
constrained source language (CSL) lexical knowledge- 
base; 

(b) if the term is not found in said CSL lexical knowl- 
edgebase then, 

(i) spellchecking said term against a standard dictionary 
and allowing said author to correct the spelling of 
said term if it is misspelled; 

(ii) checking said term against said CSL lexical data- 
base; and 

(iii) providing, if available, at least one CSL synonym 
from said domain model if said term is not in said 
CSL lexical knowledgebase, and allowing said 
author to choose one of said at least one synonym. 

30. The method of claim 29, further comprising the step 
of repeating steps (a) and (b) for every term in said source 
input text. 

31. The method of claim 29, further comprising the step 
of providing a list of related CSL words or phrases to said 
author if said term has no direct CSL synonyms. 

32. The method of claim 29, further comprising the step 
of allowing said author to rewrite a sentence containing a 
non-CSL term, 
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33. The method of claim 26, further comprising the step 
of inserting a tag into said source input text after said author 
responds to said request for clarification of ambiguity. 

34. The method of claim 26, wherein said source input 
text is created in blocks of information elements. 

35. The method of claim 26, wherein said source input 
text is a text label in a graphic. 

36. lTie method of claim 26, wherein step (3) comprises 
the step of presenting an indication of the two or more 
possible meanings of said source input text to said author. 

37. A computer-based method for monolingual document 
development and multilingual translation, comprising the 
steps of: 

(1) entering input text in a source language into a text 
editor; 

(2) checking, via a language editor, said input text against 
a predetermined set of constraints stored in a domain 
model, wherein said predetermined set of constraints 
includes a set of source sublanguage rules concerning 
vocabulary and grammar, wherein said interactive 
feedback is performed in order to make said input text 
conform with said set of source sublanguage rules and 
to eliminate ambiguities, wherein said domain model is 
a tripartite domain model, said tripartite domain model 
comprising, 

a kernel which contains lexical information that is 
required by said language editor and said a machine 
translation system, wherein said lexical information 
includes lexical items that satisfy said pre- 
determined set of constraints along with associated 
semantic concepts, parts of speech, and morphologi- 
cal information, 

a language editor domain model which contains infor- 
mation that is required only by said language editor, 
wherein said information includes at least one of a 
set of synonyms for items that do not satisfy said 
pre-determined set of constraints, a dictionary of 
definitions of said lexical items, and a set of 
examples of using said lexical items, and 

a machine translation domain model which contains 
information which is required by only said machine 
translation system, said machine translation domain 
model includes a hierarchy of concepts used for 
unambiguous mapping and semantic verification in 
translation; 

(3) providing to an author interactive feedback relating to 
said input text if said predetermined set of criteria is not 
met, said interactive feedback is performed subsequent 
to consulting said domain model which provides the 
necessary domain knowledge and linguistic semantic 
knowledge about lexical units and their combinations, 
wherein said author produces, through said interactive 
feedback, unambiguous constrained source text; 

(4) translating said unambiguous constrained source text 
into a target language. 

38. The system of claim 37, further comprising the step of 
marking with a tag a portion of said input text which has 
been rendered unambiguous constrained text, wherein said 
tag indicates transl at ability. 

39. A computer-based method for monolingual document 
development and multilingual translation, the computer- 
based method comprising the steps of: 

(1) entering input text in a source language into a text 
editor; 

(2) checking, via a language editor, said input text against 
a constrained source language; 
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(3) providing to an author interactive feedback relating to 
said source input text if nonconstrained source lan- 
guage is present in said source input text until said 
source input text has been modified into a constrained 
source text, said interactive feedback being done sub- 
sequent to consulting a domain model which provides 
the necessary domain knowledge and linguistic seman- 
tic knowledge about lexical units and their 
combinations, wherein said domain model is a tripartite 
domain model, said tripartite domain model 
comprising, 

a kernel which contains lexical information that is 
required by said language editor and said a machine 
translation system, wherein said lexical information 
includes lexical items within said constrained source 
language along with associated semantic concepts, 
parts of speech, and morphological information, 

a language editor domain model which contains infor- 
mation that is required only by said language editor, 
wherein said information includes at least one of a 
natural language subset of synonyms for items not 
within said constrained source language, a dictionary 
of definitions of said lexical items, and a set of 
examples of using said lexical items, and 

a machine translation domain model which contains 
information which is required by only said machine 
translation system, said machine translation domain 
model includes a hierarchy of concepts used for 
unambiguous mapping and semantic verification in 
translation; 

(4) checking for syntactic grammatical errors and seman- 
tic ambiguities in said constrained source text by con- 
sulting said domain model; 

(5) providing interactive feedback to said author to 
remove said syntactic grammatical errors and said 
semantic ambiguities in said co as trained source text to 
produce at lease one unambiguous constrained source 
text; and 

(6) saving said at least one unambiguous constrained 
information element for later use; 

(7) translating with said machine translation system said 
at least one unambiguous constrained source text into a 
foreign language. 

40. A computer-based method for monolingual document 
development and multilingual translation, comprising the 
steps of: 

(1) entering into a text editor at least one information 
element created in a source language; 

(2) checking, via a language editor, said at least one 
information element against a constrained source lan- 
guage; 

(3) providing to an author interactive feedback relating to 
said at least one information element if nonconstrained 
source language is present in said at least one infor- 
mation element until said at least one information 
element has been modified into a constrained source 
test, said interactive feedback is performed after con- 
sulting a domain model which provides the necessary 
domain knowledge and linguistic semantic knowledge 
about lexical units and of their combinations, wherein 
said domain model is a tripartite domain model, said 
tripartite domain model comprising, 

a kernel which contains lexical information that is 
required by said language editor and said a machine 
translation system, wherein said lexical information 
includes lexical items within said natural language 
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subset along with associated semantic concepts, 
parts of speech, and morphological information, 

a language editor domain model which contains infor- 
mation that is required only by said language editor, 
wherein said information includes at least one of a 5 
natural language subset of synonyms for items not 
within said natural language subset, a dictionary of 
definitions of said lexical items, and a set of 
examples of using said lexical items, and 

a machine translation domain model which contains 10 
information which is required by only said machine 
translation system, said machine translation domain 
model includes a hierarchy of concepts used for 
unambiguous mapping and semantic verification in 
translation; 15 

(4) checking for syntactic grammatical errors and seman- 
tic ambiguities in said constrained text by consulting 
said domain model; 

(5) providing interactive feedback to said author to 
remove said syntactic grammatical errors and said 
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semantic ambiguities in said constrained source text to 
produce at least one unambiguous constrained infor- 
mation element; 

(6) saving said at least one unambiguous constrained 
information element for later use; 

(7) translating with said machine translation system said 
at least one unambiguous constrained information ele- 
mcnt into a foreign language. 

41. The method of claim 40, further comprising the step 
of marking with a lag said information element certifying it 
to be translatable. 

42. The method of claim 40, wherein step (3) of providing 
interactive feedback includes the step of allowing said 
author to select from a list of synonyms a word or phrase to 
replace said nonconstrained language in said at least one 
information element. 

***** 
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